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Abstract — Secure multi-party computation is a central 
problem in modern cryptography. An important sub-class 
of this are problems of the following form: Alice and Bob 
desire to produce sample(s) of a pair of jointly distributed 
random variables. Each party must learn nothing more 
about the other party's output than what its own output 
reveals. To aid in this, they have available a set up — 
correlated random variables whose distribution is different 
from the desired distribution — as well as unlimited 
noiseless communication. In this paper we present an 
upperbound on how efficiently a given set up can be used 
to produce samples from a desired distribution. 

The key tool we develop is called tension — or more 
precisely, the region of tension — which measures how 
well the correlation between a pair of random variables 
can be (or rather, cannot be) resolved as a piece of common 
information and other independent pieces of information. 
We show various properties of this region, including a 
crucial monotonicity property: a protocol between two 
parties can only lower the tension between their views 
(i.e., a (low) level of tension that used to be achievable 
before the protocol remains achievable after it, along with 
possibly new lower levels). Then, by calculating the bounds 
on the region of tension of various pairs of correlated 
random variables, we derive state-of-the-art bounds on the 
efficiency of producing samples from a desired distribution 
using a given set up. 

Another important contribution of this work is to 
generalize the notion of common information of two 
dependent variables introduced by [Gacs-Korner, 1973]. 
They defined common information as the largest entropy 
rate of a common random variable two parties observing 
one of the sources each can agree upon. It is well-known 
that their common information captures only a limited 
form of dependence between the random variables and 
is zero in most cases of interest. Our generalization, 
which we call the Assisted Common Information system, 
lets us take into account "almost common" information 

This work was presented in part at IEEE International Symposia 
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ignored by Gacs-Korner common information. In the 
assisted common information system, a genie assists the 
parties in agreeing on a more substantial common random 
variable; we characterize the trade-off between the amount 
of communication from the genie and the quality of 
the common random variable produced. We show that 
the optimal trade-off is essentially given by the region 
of tension. Connections to the Gray-Wyner system and 
Wyner's common information are also studied. 

I. Introduction 

Secure multi-party computation is a central problem in 
modern cryptography. Roughly, the goal of secure multi- 
party computation is to carry out computations on inputs 
distributed among two (or more) parties, so as to provide 
each of them with no more information than what their 
respective inputs and outputs reveal to them. Our focus in 
this paper is on an important sub-class of such problems 
— which we shall call secure 2-party sampling — in 
which the computation has no inputs, but the outputs 
to the parties are required to be from a given joint 
distribution (and each party should not learn anything 
more than its part of the output). Also we shall restrict 
ourselves to the case of honest-but-curious adversaries. 
It is well-known (see, for instance, [30] and references 
therein) that very few distributions can be sampled from 
in this way, unless the computation is aided by a set 
up — some correlated random variables that are given 
to the parties at the beginning of the protocol. The set 
up itself will be from some distribution (X, Y) (Alice 
gets X and Bob gets Y) which is different from the 
desired distribution (U, V) (Alice getting U and Bob 
getting V). The fundamental question then is, which 
set ups (X, Y) can be used to securely sample which 
distributions (U,V), and how efficiently. 

While the feasibility question can be answered using 
combinatorial analysis (as, for instance, was done in 
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[19]), information theoretic tools have been put to good 
use to show bounds on efficiency of protocols (e.g. [2], 
[7], [27], [15], [12], [5], [13], [29], [26]). Our work 
continues on this vein of using information theory to 
formulate and answer efficiency questions in cryptogra- 
phy. Specifically, we generalize the concept of common 
information [9] as defined by Gacs and Korner (GK) and 
use this generalization to establish upper bounds on the 
efficiency of secure sampling. 

Finding a meaningful definition for the "common 
information" of a pair of dependent random variables 
X and Y has received much attention starting from the 
1970s [9], [28], [31], [1], [33]. We propose a new mea- 
sure — a three-dimensional region — which brings out 
a detailed picture of the extent of common information 
of a pair. This gives us an expressive means to compare 
different pairs with each other, based on the shape and 
size of their respective regions. Besides the specific ap- 
plication to secure sampling discussed in this paper, we 
believe that our generalization may potential applications 
in information theory, cryptography, game theory, and 
distributed control, where the role of dependent random 
variables and common randomness is well-recognized. 

Suppose X = {X',Q) and Y = (Y',Q) where 
X',Y',Q are independent. Then a natural measure of 
"common information" of X and Y is H(Q). Q is 
determined both by X and by Y, and further, conditioned 
on Q, there is no "residual information" that correlates X 
and Y i.e., X — Q — Y. One could extend this to arbitrary 
X, Y, in a couple of natural ways. One approach, which 
corresponds to a definition of Gacs and Korner [9] 1 is to 
find the "largest" random variable Q that is determined 
by X alone as well as by Y alone (with probability 1): 

C GK (X;Y)= max. H(Q) 

PQ\xy- 
H(Q\X)=H(Q\Y)=0 

= I(X;Y)- mm. I(X;Y\Q). (1) 

H{Q\X)=H(Q\Y)=0 

Note that in this case, the common information is nec- 
essarily no more than the mutual information, and in 
general this gap is non-zero, i.e., common information, in 
general, does not account for all the correlation between 
X and Y. An alternate generalization, which corresponds 
to the approach of Gray and Wyner [31] 2 is to consider 

'This is not the definition of common information in [9], but 
the consequence of a non-trivial result in that work. The original 
definition, which is in terms of a communication problem, is detailed 
in Section III (along with our extensions). 

2 Again, the actual definition of [31], which is in terms of a source 
coding problem, is different. The expression above is a consequence 
of a result in [31]. The definition and results in [31] are described in 
Section IV. 



the "smallest" random variable Q so that conditioned on 
Q there is no residual mutual information. Smallness of 
Q, in this case is measured in terms of I(XY; Q). 

C W yner(X;y)= mill I(XY;Q) 

Vq\xy- 
X-Q-Y 

= I(X;Y)+ mm (I(Y;Q\X) + I(X;Q\Y)). (2) 

X-Q-Y 

Note that in this case, the common information is 
necessarily no less than the mutual information. When 
X,Y are of the form X = (X',Q) and Y = (Y',Q), 
where X',Y',Q are independent, then there indeed is 
a unique interpretation of common information (when 
C GK (X;Y) = C Wyner (X;Y) = H(Q)). But otherwise, 
between the extremes represented by these two measures, 
there are several ways in which one could define a 
random variable to capture the correlation between X 
and Y. 

One way to look at the new quantities we introduce 
is as a way to capture an entire spectrum of random 
variables that approximately capture the correlation be- 
tween X and Y. In Section II we shall define a three- 
dimensional "region of tension" for X, Y, which mea- 
sures how well can the correlation between X, Y be cap- 
tured by a random variable. In Figure 1 we schematically 
depict this region. Looking ahead, we mark the quantities 
I(X;Y) - C GK (X;Y) and C Wy ner(* ; r) - I(X;Y) 
there in this figure, to illustrate the gap between mutual 
information and the two notions of common information 
in terms of the region of tension. 

In Section III, we generalize the Gacs-Korner system 
in terms of which Cqk lS defined (see Figure 4) to the 
"Assisted Common Information system." We show that 
the associate rate regions are closely related to the region 
of tension (Corollary 3.2). In Section IV, we consider 
the Gray-Wyner system (which gives a generalization 
of Cwyner) and show that the rate region associated 
with this system is also closely related to the region 
of tension (Theorem 4.3). This clarifies the connection 
between Cqk an d the Gray-Wyner system. In particular, 
previously known connections readily follow from our 
results. Further, we show how two quantities identified 
in recent work in the context of lossless coding with side- 
information [20] and the Gray-Wyner system [17] can be 
obtained in terms of the region of tension (Corollary 4.6). 

Quite apart from the information theoretic questions 
related to common information, our motivating appli- 
cation for defining the region of tension is the crypto- 
graphic problem of bounding the efficiency of secure- 
sampling described above. In Section V, we show that 
the region of tension of the views of two parties engaged 
in such a protocol can only monotonically lower (expand 
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towards the origin) and not rise (shrink away from the 
origin). Thus, by comparing the regions for the target 
random variables and the given random variables, we ob- 
tain improved upperbounds on the efficiency with which 
one pair can be used to securely generate another pair. 
We also give an example where this upperbound strictly 
improves on the prior work, but is further interesting 
for two reasons: firstly, this example is based on natu- 
ral correlated random variables that are widely studied 
(namely, variants of oblivious transfer), and secondly the 
new upperbound we can prove actually matches an easy 
lowerbound and is therefore tight. 

Outline: Section II defines the region of tension 
for a pair of correlated random variables, and estab- 
lishes some of its properties. Section III and Section IV 
introduce the concepts of common information Cqk 
and Cwyner m terms of the Gacs-Korner and Gray- 
Wyner systems (and a new generalization, in the case 
of the former), and establishes the connections with the 
region of tension. Section V defines the secure sampling 
problem, a monotonicity property of the region of tension 
and its application in bounding the efficiency of secure 
sampling. The reader may choose to read only Section II, 
Section III and Section IV, or alternately only Section II 
and Section V. 

II. Tension and the Region of Tension 

Now we introduce our main tool which generalizes 
GK common information and also serves as a measure 
of cryptographic complexity of securely sampling a pair 
of random variables. Intuitively, we measure how well 
common information captures (or does not capture) the 
mutual information between a pair of random variables 
(X,Y). 

A. Definitions 

Throughout this paper we concern ourselves with pairs 
of correlated finite random variables (X, Y) with joint 
distribution (p.m.f.) px,Y- % an ^ ^ shall stand for 
the (finite) alphabets of X and Y respectively. We let 
Vx,Y denote the set of all random variables Q jointly 
distributed with (X, Y) — i.e., all conditional p.m.f.s 

PQ\X,V 

The total variation distance 3 between two random 
variables X and X' over the same alphabet X is 
A(X,X>) 4 l\\ P x- P x\\i = 1 2J2 xeX \px(x)- PX '(x)\. 
H2{.) will denote the binary entropy function: H2(p) = 
p\og{l/p) + (1 -p) log(l/(l -p)) (for < p < 1), and 
H 2 (0) = H 2 (l) = 0. 

3 ln cryptography literature, A(-,-) is more commonly called sta- 
tistical difference. 



The characteristic bipartite graph of a pair of corre- 
lated random variables (X, Y) is the graph with vertices 
in X U y and an edge between x G X and y G y if and 
only if pxy(x, y) > 0. (See Figure 3 for an example.) 

Now we give the main definitions of this section. 

Definition 2.1: For a pair of correlated random vari- 
ables (X, Y), and Pq\xy G Vx,y, we say Q perfectly 
resolves (X,Y) if I(X;Y\Q) = and H(Q\X) = 
H(Q\Y) = 0. We say (X,Y) is perfectly resolvable 
if there exists Pq\xy £ Vx,Y such that Q perfectly 
resolves (X, Y). 

If (X, Y) is perfectly resolvable, then their GK common 
information represents the entire mutual information 
between them (see (1)). We intend to measure the extent 
to which (X, Y) is not perfectly resolvable. Towards this 
we introduce a 3-dimensional measure called tension of 
(X, Y), defined as follows. 

Definition 2.2: For a pair of correlated random vari- 
ables (X, Y) and Pq\xy £ Vx,Y' ^ e tension of (X, Y) 
given Q is denoted by T(X;Y\Q) G R\ and defined 
as T(X;Y\Q) 4 (l(Y;Q\X), I(X;Q\Y), I(X;Y\Q)). 
The region of tension of (X, Y), denoted by 1(X; Y) C 
M? + is defined as 

1(X; Y) 4 i ({T(X; Y\Q) : p Q \ XY & V x ,y}) , 

where i (S) denotes the increasing hull of S C 
defined as i (S) = {s £ R 3 + : 3s' € S s.t. s > s'}. 4 

Since we consider only random variables with finite 
alphabets X and y, it follows from Fenchel-Eggleston's 
strengthening of Caratheodory's theorem [6, pg. 310], 
that we can restrict ourselves to Pq\xy ^ Vx,Y with 
alphabet Q such that \Q\ < \X\\y\ + 2. More precisely, 

1(X; Y) = i ({T(X; Y\Q) : p Q \ XY G P x ,y}) , (3) 

where Px,Y is defined as the set of all conditional 
p.m.f.'s Pq\x,y such that the cardinality of alphabet Q 
of Q is such that \Q\ < \X\\y\ +2. 

We point out that %(X; Y) intersects all three axes 
(e.g., consider Q = Y, Q = X and Q = 0, respectively). 
It will be of interest to consider the three axes intercepts 
of the boundary of %(X; Y). 

Tl nt (X;Y) 4 minln : (n,0,0) G T(X;F)} 

T?\X- Y) 4 min{r 2 : (0, r 2 , 0) G 1(X; Y)} (4) 

T^{X-Y) 4 min{r 3 : (0,0, rs) G T(X;Y)} 

The use of min instead of inf anticipates Theorem 2.4 
which shows that T(X; Y) is closed. 

4 For two vectors (x,y,z), (x',y',z') £ R+, we write (x,y,z) > 
(x 1 , y' , z') to mean x > x', y > y' and z > z' . 
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Fig. 1: A schematic representation of the region 
1(X;Y). 1(X;Y) is an unbounded, convex region, 
bounded away from the origin (unless (X, Y) is perfectly 
resolvable). Relationship between two points on the 
boundary of 1.(X; Y) and the quantities Cqk (X;Y) and 
Cwyner (X;Y) (see (16) and (34)) is shown. (The dotted 
line is at 45° to the axes.) 



B. Some Properties of Tension 

Firstly, we have an easy observation. 

Theorem 2.1: 1(X; Y) includes the origin if and only 
if the pair (X, Y) is perfectly resolvable. 

Proof: We need to show that there exists Pq\xy 
such that I(Y;Q\X) = I(X;Q\Y) = I(X;Y\Q) = 
if and only if there exists Pq>\xy sucri th at H(Q'\X) = 
H(Q'\Y) = I(X; Y\Q') = 0. Clearly, the second condi- 
tion implies the first by taking Q to be the same as Q' . 
The converse follows from Lemma A. 1 which shows that 
given p Q \ XY such that I(Y;Q\X) = I(X;Q\Y) = 0, 
we can find a random variable Q' with H(Q'\X) = 
H{Q'\Y) = and Q - Q' - XY; then, by Lemma A.2 
it follows that I(X;Y\Q') < I(X;Y\Q), and hence 
I(X; Y\Q) = implies I(X; Y\Q') = 0. ■ 
The more interesting case is when %{X; Y) does not 
contain the origin, and hence (X, Y) is not perfectly 
resolvable. Note that it is important to consider all three 
coordinates of T(X; Y\Q) together to identify the unre- 
solvable nature of a pair (X,Y), because, as observed 
above, %{X\ Y) does intersect each of the three axes, or 
in other words, any two coordinates of T(X;Y\Q) can 
be made simultaneously by choosing an appropriate 
Q. 

As it turns out, the axes intercepts are identical to three 
quantities identified by Wolf and Wullschleger [29]. In 
[29] these quantities were defined as 

H(X\Y\Y) H{Y\X\X) I(X;Y\XAY) 



where, X \ Y stands for the part of X which depends 
on Y (i.e., a function of X which distinguishes between 
different values of X if and only if they induce different 
conditional distributions on Y), and X AY stands for 
the common information between X and Y (i.e., the 
"maximal" function of X that is also a function of Y, as 
discussed in more detail in Section III). More precisely, 
the three quantities considered there are such that: 



H(Y\X\X) 
H(X \ Y\Y) 
I(X; Y\X A Y) 



min H(Q\X) 

PQIXY :H(Q\Y)=I(X;Y\Q)=0 

min H(Q\Y) 

P QI xy:H(Q\X)=I(X;Y\Q)=0 

min I(X;Y\Q). 

p QlxY :H(Q\X)=H(Q\Y)=0 



In the appendix we prove the following theo- 
rem that these three quantities are the same as 

(Tf 1 \X; Y),T$*(X; Y),T™\X; Y)). 

Theorem 2.2: 



T{ nt (X;Y) 
Tf\X;Y) 
Tt\X;Y) 



mm 

Pq\xy- 
H(Q\Y)=I(X;Y\Q)=0 

min 

Pq\xy- 
H(Q\X)=I(X;Y\Q)=0 

min 

PQIxy- 
H(Q\X)=H(Q\Y)=0 



H{Q\X) 

H(Q\Y) 
I(X;Y\Q). 



(5) 
(6) 
(7) 



Monotonicity of%{X;Y): Wolf and Wullschleger 
showed that these three quantities have a certain "mono- 
tonicity" property (they can only decrease, as X, Y 
evolve as the views of two parties in a secure protocol). 
We shall see that the monotinicity of all the three 
quantities is a consequence of the monotinicity of the 
entire region %{X;Y). We define the precise nature 
of this monotonicity in Section V-B and prove it for 
T(X; Y) in Section V-C. 

The following result (proven in Appendix A) will be 
useful in defining a "multiplication" operation on the 
region of tension as a scaling (see (44)). This in turn 
would be useful in relating the region of tension and the 
rate of secure sampling, in Section V. 

Theorem 2.3: The region %(X\ Y) is convex. 

In extending the results in Section V to statistical 
security (rather than perfect security), the following 
results would be important. Firstly, the region of tension 
is closed. 

Theorem 2.4: The region T(X; Y) is closed. 

Proof: By (3), and the fact that the increasing hull of 
a compact set is closed (see Lemma A.3 in Appendix A), 
it is enough to show that {T(X; Y\Q) : Pq\xy G ^xy} 
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is compact (i.e., closed and bounded (Heine-Borel the- 
orem)). For this, notice that T(X;Y\Q) as a function 
of Pq\xy - i- e -> as a function from Vx,Y to R 3 - is 
continuous. Moreover, Vx,Y is compact. Since the image 
of a compact set under a continuous function is compact, 
{T(X; Y\Q) : p Q \ XY G V x ,y} is compact. ■ 
Secondly, the region of tension is continuous in the 
sense that when the joint p.m.f. px,Y is close to the 
joint p.m.f. px',Y', the tension regions 1(X;Y) and 
%{X'\ Y') are also close. We measure closeness of these 
two joint p.m.f.'s (assumed without loss of general to be 
defined over the same alphabet X x y) by their total 
variation distance A(XY,X'Y'). 

Theorem 2.5: Suppose A(XY,X'Y') = e, for some 
e G [0,1]. Then, 1(X;Y) C 1(X';Y') - 5(e), where 
5(e) = 2H 2 (e) + elogmax{|#|, p>|}, and for S G 
M 3 , aeK, the notation S — a stands for {(ri — a, r 2 — 
a,r 3 - a) : (n,r 2 ,r 3 ) G S}. 

Proof: Suppose (r\,r 2 ,r 3 ) G %(X;Y). We shall 
show that (n + 5(e),r 2 + 5(e),r 3 + 5(e)) £<Z(X';Y'). 
Since (n,r 2 ,r 3 ) G Y), there is a p Q \ x ,Y G ^x,r 
such that 7(Y;Q|X) < n, i"(X;Q|Y) < r 2 , and 
I(X;Y|Q) < r 3 . Let Pq>|x',Y' = PQ|x,Y- It is enough 
to prove that 

I(Y>;Q>\X')<I(Y;Q\X) + 5(e), 
I(X';Q'\Y')<I(X;Q\Y) + 5(e), 
I(X'-Y'\Q')<I(X-Y\Q) + 5(e). 

We will make use of the following lemma which is 
proved in Appendix A. 

Lemma 2.6: Suppose random variables (A, B, C) and 
(A',B',C) over the same alphabet A x B x C are 
such that A(ABC,A'B'C) = e. Then I(A';B'\C) < 
I(A- B\C) + 2H 2 (e) + elogmin{|^|, \B\}. 

Note that since pq,\ X ',y> = Pq\x,y> we have 
A(XYQ,X'Y'Q') = A(XY,X'Y') = e. Then 
we invoke Lemma 2.6 thrice (with (ABC, A'B'C) 
standing for (YQX, Y'Q'X'), (XQY,X'Q'Y r ) and 
(XYQ, X'Y'Q'), respectively). This combined with the 
fact that min{p>|,|Q|}, mm{\X\,\Q\}, mm{\X\, \y\}, 
are all upperbounded by max{|Af|, \y\}, we obtain the 
requisite bounds. 



C. A Few Examples 

Obtaining closed form expressions for the region 
T(X; Y) can be difficult. However, for our applications 
it often suffices to identify parts of the boundary of 



T(X; Y). We give a couple of examples below. A more 
detailed example appears in Section V-E. 

Example 2.1: Figure 2 shows the joint p.m.f. of a pair 
of dependent random variables X, Y. 




Fig. 2: X,Y are dependent random variables whose 
joint p.m.f is shown. The solid black lines each carry 
a probability mass of ^ and the lighter ones |. In the 
plot, all points (R\,R 2 ) on the dotted lines are such that 

(R 1 ,R 2 ,Q)e1(X;Y). 

When 5 = 0, they have the simple dependency struc- 
ture of X = (X', Q), Y = (Y', Q) where X' , Y', Q are 
independent. This is the trivial case in the introduction, 
and the observers can each produce, without any assis- 
tance from the genie, Q which renders their observations 
conditionally independent. Thus, the set of rate pairs 
(Ri,R 2 ) such that (Ri,R 2 ,0) G %(X;Y) is the entire 
positive quadrant. For small values of 5 we intuitively 
expect the random variables to be "close" to this case. 
A measure such as the common information of Gacs 
and Korner fails to bring this out (common information 
is discontinuous in 5 jumping from H(Q) = 1 at 5 = 
to for 5 > 0). However, the intuition is borne out 
by our trade-off regions. For instance, for 5 = 0.05, 
Figure 2 shows that the set of rate pairs (R\,R 2 ) such 
that (i?i,i? 2 ,0) G 1(X;Y) is nearly all of the positive 
quadrant. 

Example 2.2: A binary example. Figure 3 shows the 
joint p.m.f. of a pair of dependent binary random 
variables U, V. In the plot in Figure 3 we show the 
intersection of %(U ; V) with the plane z = 0. 

III. Assisted Common Information 

Recall that when X = (X',Q) and Y = (Y',Q) 
where X' , Y', Q are independent, then a natural measure 
of "common information" of X and Y is H(Q). In 
this case, an observer of X and an observer of Y 
may independently produce the common part Q; and 
conditioned on Q, there is no "residual information" that 
correlates X and Y i.e., I(X;Y\Q) = 0. The definition 
Cgk(X;Y) of Gacs and Korner [9] generalizes this to 
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p= 1/3 



u 






Fig. 3: U, V are binary random variables with joint p.m.f. 
p(0, 0) = p(l, 1) = p, p(l, 0) = 1 - 2p, and p(0, 1) = 
0. Boundary of the set of all rate pairs (Ri,R 2 ) such 
that (#1,^2, 0) € X(17; V) for p = 1/3 is shown. The 
marked point is the minimum sum-rate point. 



arbitrary X, Y (Figure 4(a)): the two observers now 
see X n = (X 1 ,...,X n ) and Y n = {Y 1 ,...,Y n ), 
resp., where (Xi,Yi) pairs are independent drawings of 
(X,Y). They are required to produce random variables 
Wi = h{X n ) and W 2 = f 2 {Y n ), resp., which agree 
(with high probability). The largest entropy rate (i.e., 
entropy normalized by n) of such a "common" random 
variable was proposed as the common information of 
X and Y. We will refer to this as the GK common 
information of (X,Y) and denote it by Cqk(^;^)- 
However, in the same paper [9] , Gacs and Korner showed 
(a result later strengthened by Witsenhausen [28]) that 
this rate is still just the largest H(Q) for Q which can be 
obtained (with probability 1) as a deterministic function 
of X alone as well as a deterministic function of Y alone. 



C GK (X;Y) = 



max H(Q). 

Pq\xy: 
H(Q\X)=H(Q\Y)=0 



It is easy to see that the above maximum is achieved 
by the random variable Q defined over the set of con- 
nected components of the characteristic bipartite graph 
of (X,Y), such that Pq\xy(q\ x ^u) = 1 if and only if 
the edge (x, y) belongs to the connected component q. 
Note that this captures only an explicit form of common 
information in a single instance of (X,Y). 

One limitation of the common information defined by 
Gacs and Korner is that it ignores information which is 
almost common. 5 In particular, if there is only a single 
connected component in the characteristic bipartite graph 
then the common information between them is zero, even 

5 Other approaches which do not necessarily suffer from this 
drawback have been suggested, notably [31], [1], [33]. As we show, 
our generalization is also intimately connected with [31]. 




(b) 

Fig. 4: (a) Setup for Gacs-Korner common information. 
The observers generate W\ and W 2 which are required to 
agree with high probability, (b) Assisted common infor- 
mation system. A genie assists the observers by sending 
separate messages to them over rate-limited noiseless 
links. When the genie is absent the setup reduces to the 
one for Gacs-Korner common information. 



if it is the case that by removing a set of edges that 
account for a small probability mass, the graph can be 
disconnected into a large number of components each 
with a significant probability mass. Our approach in 
this section could be viewed as a strict generalization 
of Gacs and Korner, which uncovers such extra layers 
of "almost common information." Technically, we intro- 
duce an omniscient genie who has access to both the 
observations X n and Y n and can send separate messages 
to the two observers over rate-limited noiseless links. 
See Figure 4(b). The objective is for the observers to 
agree on a "common" random variable as before, but 
now with the genie's assistance. We call this the assisted 
common information system. This leads to a trade-off 
region trading-off the rates of the noiseless links and the 
resulting common information 6 (or the resulting residual 
mutual information). We characterize these trade-off 
regions in terms of the region of tension of the two 
random variables, and show that, in general, they exhibit 
non-trivial behavior, but reduce to the trivial behaviour 
discussed above when the rates of the noiseless links are 
zero. 

As before, two observers receive X n = (X\, . . . , X n ) 

6 We use the term common information primarily to maintain 
continuity with [9]. 
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and Y n = (Yi, . . . , Y n ) respectively, and need to output 
strings W\ and W2 respectively, that must match each 
other with high probability. But here, an omniscient 
Genie G computes M x = f[ n) (X n ,Y n ) and M 2 = 
f( n) (X n ,Y n ) as deterministic functions of (X n ,Y n ) 
and sends these to the two observers as shown in 
Figure 4(b). The observers are allowed to compute their 
outputs also making use of the respective messages 
they receive from the genie, as W\ = g[ n \x n ,Mi) 
and W2 = g^ 2 n \Y n , M 2 ), where g[ n ^ and are 
deterministic functions. Here again, the goal is to study 
how large the entropy of W\ (and equivalently W2) can 
be, but controlling for the number of bits used to transmit 
Mi and M 2 . 

For a pair of random variables (X, Y) and posi- 
tive integers Ni,N 2 , n, an (N± ,N 2 ,n) assisted com- 
mon information (ACI) code is defined as a quadruple 

(fi n \A n \ 9 [ n \9i n) ), where 

fi n) :X n xy n ^{l,...,N k }, k = l,2 

g[ n) : X n x {l,...,JVi}->Z, and 

gi n) :y n x{l,...,N 2 }^Z 

are deterministic functions. A se- 
quence of (Ni(n), 7V 2 (n), n) ACI codes 

(/i (n) ,/2 n) ) fi n) 5 5 f 2™ ) )n=i,2,... is called a valid (i?i,i? 2 ) 
ACI strategy for (X, Y), if for every e > 0, for 
sufficiently large n, 

- log iV fc (n) < R k + e, A; = 1,2 (8) 
n 

PT[g^(X n , f[ n) (X n , Y n )) + g^(Y n , f^(X n , Y n ))] 

< e. (9) 

We say that a rate pair (R±,R 2 ) enables common 
information rate Rq\ > for (X,Y), if there exists a 
valid (R 1 ,R 2 ) ACI strategy (f[ n \ ff\ , gf\ for 
(X, Y) such that for every e > 0, for sufficiently large n, 

^H(g^(X n ,f[ n \x n ,Y n ))) > i? c , - e. (10) 

Similarly, we say that a rate pair (Ri,R 2 ) enables 
residual information rate R R \ for (X, Y), if there exists 
a valid (R 1 ,R 2 ) ACI strategy {f[ n \ / 2 (n) , g{ n \gi n) ) n for 
(X, Y) such that for every e > 0, for sufficiently large n, 

^I(X n ;Y n \g { ; i \x n ,rt ri \x n ,Y n ))) < fl R , + e. (11) 

Note that if (Ri,R 2 ) enables residual information rate 
Rm, and (R[, R' 2 , R' m ) > (R 1 ,R 2 ,Rr\), then (R[,R' 2 ) 
enables residual information rate R' m too. 

Definition 3.1: The assisted common information re- 
gion 1Zac\ (X; Y) of a pair of correlated random variables 



(X,Y) is the set of all (R 1 ,R 2 ,R C \) G K+ such 
that (Ri,R 2 ) enables common information rate i?ci 
for (X, Y). Similarly the assisted residual information 
rate region 1Z AR \(X;Y) of (X, Y) is the set of all 
(Ri, R2, Rr\) G such that (i?i,i? 2 ) enables residual 
information rate R R \ for (X, Y). In other words, 

K AC \{X;Y) = {{R^R^Rcfi : (#1,^2) enables 

common information rate Rq\ for (X, Y)}, 

K AR] {X;Y) = {(R^^Rm) : (i?i,i? 2 ) enables 

residual information rate R R \ for (X, Y)}. 

We will write T^aci an d T^ari when the random variables 
involved are obvious from the context. It is easy to see 
from the definition that T^aci an d 7?-ari are closed sets. 

Our main results regarding assisted common infor- 
mation system characterize the assisted residual and 
common information rate regions of (X, Y), and relate 
them to the region of tension of (X, Y). 

Recall that Px,Y is the set of all conditional p.m.f.'s 
Pq\x,y such that the cardinality of alphabet Q of Q is 
such that \Q\ < \X\\y\ + 2. We have the following 
characterization of the assisted common and residual 
information regions: 

Theorem 3.1: 

K ARi {X;Y) = {(n,r 2 ,r R |) G R% : 3p Q \ X Y G V X ,Y s.t. 

n > I(Y;Q\X),r 2 > I(X;Q\Y),r R] > I(X;Y\Q)}. 
K AC \{X;Y) = {(ri,r 2 ,r C i) € R% : ^Pq\xy G Vx,y s.t. 

n > I(Y;Q\X),r 2 > I(X;Q\Y),r C \ < I(X,Y;Q)}. 

We prove this theorem in Section III-B. An immediate 
consequence is that we have an interpretation of the 
region of tension %(X; Y) as the assisted residual infor- 
mation region 1Z AR \(X;Y). We may also write it down 
in terms of the assisted common information region: 

Corollary 3.2: For any pair of correlated random vari- 
ables (X,Y), 

1(X;Y)=K A w(X;Y) (12) 
Z(X; Y) = i {f X y{K AC \{X- Y))) (13) 

where /x,y is an affine map defined as 





" Ri ' 




Ri 


fx,Y ^ 


R2 


)' 


R2 




R3 




I(X; Y) + Ri + R 2 - R 3 



We prove (13) in Appendix B. 
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A. Behavior at R\ = R 2 = and Connection to Gacs- 
Korner [9] 

As discussed above, Gacs and Korner denned the 
common information, Cqk(X;Y) using the system in 
Figure 4(a), where there is no genie. Formally, an n-GK 
map-pair (g[ n \g^) is a pair of maps : X n — > Z 
and g^ : y n — > Z. We will say that Rq\ is an 
achievable common information rate for (X, Y) if there 
is a sequence of GK map-pairs (g^, g 2 )n=i,2,... sucri 
that for every e > 0, for large enough n, 

Pr[g^\x n )^g^(Y n )]<e, and 

l - H (gf\x n ))>R C \-e. 

GK common information Cqk(^;Y) is the supremum 
of all achievable common infomation rates for (X, Y). 
As mentioned earlier, Gacs and Korner [9] showed that 
Cqk(^; Y) is simply H(Q) where Q corresponds to the 
connected component in the characteristic bipartite graph 
of(X,Y). 

It is clear from the definition that (0, 0, Cgk(-^; Y)) G 
1ZflC\(X;Y). However, it is not clear whether 
Cqk(X;Y) is the largest value of Rq\ such 
that (0,0, -Ra) e Kac\(X;Y); i.e., if we define 
T^-AC^i^'jY) as the axis intercept of the boundary of 
Hac\(X;Y) along the Rq\ axis as follows 

K AC $(X;Y) 4max{i2ci : (0, 0, i? C l) G ^aciP^Y)}, 

then it is not immediately clear whether Cgk(^;Y) = 
^ACl™*^; This is because the absence of links from 
the genie is a more restrictive condition than allowing 
"zero-rate" links from the genie (notice the e in (8)). So 
we may ask whether introducing an omniscient genie, 
but with zero-rate links to the observers, changes the 
conclusion of Gacs-Korner. In other words, whether 
n kC \'f{X-Y) is larger than C GK (X;Y). The corollary 
below (proven in Appendix B) answers this question in 
the negative. Also note that the result of Gacs-Korner can 
be obtained as a simple consequence of this corollary. 

Corollary 3.3: 

C GK (X;Y) = K AC $(X;Y) (14) 
max H(Q). (15) 

PqIxy&Vx.y- 
H(Q\X)=H(Q\Y)=0 

Further, 

Tt t (X;Y) = I(X;Y)-C GK (X;Y) (16) 

Thus, at zero rates for the links, assisted common infor- 
mation exhibits the same trivial behavior as Cqk- 



B. Proof of Theorem 3. 1 

We first prove the converse (i.e., L.H.S. C R.H.S.). 

Let e > 0, and n and an (iVi(n), N 2 {n), n) ACI 
code (fi n \ri n \g[ n \gi n) ) be such that (8)-(10) hold. 
Let C k = fl n) (X n ,Y n ), for k = 1,2, and W x = 
g^ ) (X n ,C 1 ) and W 2 = g ( 2 n) (Y n ,C 2 ). Then, 

Ri + e> -Hid) > -HfdlX 71 ) > -HiWAX 71 ) 
n n n 

> -I(Y n ;W 1 \X n ) 
n 

1 n 

" n £ H W X $ ~ H(Y i \Y i ~ 1 ,X n , W 1 ) 
i=i 
1 n 

^ - E H ( Y i\ x i) - H (Yi\Xu w x , y- 1 ,**- 1 ) 

1=1 

n 

= Y j -I(Y i ;Q t \X i ), 
i=i 

where Q t = (Wi,Y i ~ 1 , X i_1 ) 
( = } I(Yj;Qj\Xj,J), 

where pj(i) = -, i G {1, . . . , n}, 

( = } /(y j; Q|Xj), where Q 4 (Qj, J), 

where (a) follows from the independence of (Xj,Y) 
pairs across i. In (b), we define J to be a random variable 
uniformly distributed over { 1 , . . . , n} and independent of 
(X n ,Y n ). And (c) follows from the independence of J 
and (X n ,Y n ). Similarly, 

r 2 + e > -^(c 2 |y n ) > -#(W 2 |Y") 

n n 

= -H(W u W 2 \Y n ) - -H(Wi\W 2 ,Y n ) 
n n 

> -H(Wi\Y n ) - -H(WAW 2 ) 
n n 

(a) 

> H{Wl\Y n ) -K€ 

> -I(X n ;Wi\Y n ) - Ke (17) 
n 

(b) 

> I(Xj;Q\Yj)-k€, 

where (a) (with k = 1 + log follows from Fano's 

inequality and the fact that the range of g\ can be 
restricted without loss of generality to a set of cardinality 
| < Y| n |3 ; | n . And (b) can be shown along the same lines as 
the chain of inequalities which gave a lower bound for 
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i?i above. Moreover, 

n 

_UX n - Y n \ Wi) = -Y UXi- Y^lWuX*- 1 ) 



The distortion constraint D is given by 



i=l 
n 



> - Ynxf^iw!,^- 1 ,^- 1 ) 



i=i 

= I(Xj;Yj\Q). 

Since Xj,Yj has the same joint distribution as X, Y, 
the converse for assisted residual information follows. 
Similarly, the converse for assisted common information 
can be shown using 

-H(W!) 
n 



(a) 1 



n 



/(X n ,y n ;^i) 



1 n 

= - ]T H(Xi , Y) - HiX, , Y t | Wi , X 1 - 1 , Y 1 ' 1 ) 

1=1 

= -J2 ^ ^ Qi) = HXj, Yj; Q), 

Tl . 
i=l 

where (a) follows from the fact that W\ is a deterministic 
function of (X n ,Y n ). The fact that instead of Vq\xy £ 
Vx,Y we can consider Vq\xy £ Vx,Y with alphabet 
Q such that \Q\ < \X\\y\ + 2 follows from Fenchel- 
Eggleston's strengthening of Caratheodory's theorem [6, 
pg- 310]. 

To prove achievability (i.e., L.H.S. 5 R.H.S.), we 
will use a result from lossy source coding. See, e.g., [4, 
Chapter 10] for a description of the lossy source coding 
problem. Consider a source ps, and source and recon- 
struction alphabets S and S, respectively. We have the 
following lemma: 



Lemma 3.4: Given a conditional distribution p 
there is a distortion measure cl:5x5-> 



5|5' 
U{oo}, 



and a distortion constraint D such that the pi is a 

J | O 

minimizer for 

R{D)= min I(S;S). 

Psis^v s P S]s [d(S,S)}<D 

Moreover, unless I(S; S) = (in which case any d 
works), the distortion measure d is given by 

d(s, s) = -clogp* g(s|S) + do(s), (18) 

where c > and the function do can be chosen 
arbitrarily, and 



P s{s (s\s) 



P s (s)p^ s (s\s) 
E5P S (»)P|| S (»I 5 )' 



D = E 



d(S, S) 



Proof: See [6, Problem 3, pg. 147]; also see [10, 
Lemma 4] for a proof. ■ 
For a given Pqi xy e ^xy we nee d to argue that 

(I(Y; Q\X),I(X; Q\Y), I(X, Y; Q)) G TZ AC \(X; Y), 
(I(Y; Q\X),I(X; Q\Y), I(X; Y\Q)) G 1Z^{X; Y), 

where the conditional mutual information quantities are 
evaluated using the joint distribution p x yPq\xy- ^ ote 
that these quantities are continuous in Pqi xy - Moreover, 
as was mentioned earlier, it is easy to verify from their 
definitions that TZ^q\{X;Y) and 1Z&f{\(X;Y) are closed 
sets. Hence, we may make the following assumption on 
Pq\xy without loss of generality: 

Assumption: Pqi xy il\ x ' v) > f° r au ( x ' Vi l) e ^ x 
y \ Q- 

In Lemma 3.4, let p s be p x Y and pi s be P*q\ xy - Let 
d : X x y x Q R + U {oo} denote the distortion 
measure and D* the distortion constraint promised by 
the lemma. 



Let 



d max = max max d(x, y, q). 
(x,y)ex*y-. qeQ 

Px,y(x,v)>0 

Under the above Assumption, it is clear from (18) that 

<^max ^ OO. 




Fig. 5: Set up in the proof of Theorem 3.1 

The rest of the proof proceeds as follows: we will 
define a distributed source coding problem (see Figure 5) 
where the first goal is for the observers to agree on a 
common random variable as in the assisted common 
information setup. However, instead of this common 
random variable meeting (10) or (11), we will require 
that an output sequence Q n , which is produced as a 
deterministic function of the common random variable, 
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must meet a distortion criterion. The distortion measure 
and the distortion constraint are those obtained above 
using Lemma 3.4. We will show that these requirements 
can be met using a code which operates at (R\,R2) = 
(I(Y;Q\X),I(X;Q\Y)). We will then argue that this 
must imply that the common random variable also meets 
(10) and (11). 

We make the following definitions (see Figure 5): 
we define an (N, Ni, N2, n) code as a quintuple 

(n) On' 
,01 ,02 



(fi n \A n) ,g[ n \ 9 i n \h), where 



An) 
Jk 

(n) 
01 

(n) 
02 



x n x y n ->{i,...,jv fc }, k = i,2 

X n x{l,...,N 1 }^{l,...,N}, 
y n x {1,...,JV 2 }->{1,...,JV}, and 
{l,...,N}^Q n 



are deterministic functions. Note that embedded in this 
code is an (JVi, N2, n) ACI code. The probability of error 
of a code is defined as 

An) lv n An), 



P^=Pv[g\ n \x n j[ n} (X n ,Y n )) 



^g^(Y n ,fP(X n ,Y n ))]. (20) 



F (n) 



Let 



Q n = h {n) (g[ n) (x n ,f[ n) (X n ,Y n ] 



For D > 0, we will say that (Ri,R,2,D) is achievable 
if there is a sequence of (N(n), iVi(n), AT 2 (n), n) codes 
such that for every e > 0, for sufficiently large n, 

1 log iV fe (n) < iifc + e, fc = l,2 (21) 
Pi") < e, (22) 



n 



and the following average distortion contraint holds 

1 ™ 

-^E^.^Qi^D + e. (23) 

i=l 

The rate-distortion tradeoff region 1Z is the closure of 
the set of all achievable (R\, R2, D). 

The following lemma is proved in Appendix B us- 
ing standard techniques from distributed source coding 
theory (see, for instance, [8, Chapter 11]). 

Lemma 3.5: 

(I(Y;Q\X),I(X;Q\Y),D*) eK, 

where the conditional mututal informations are evaluated 
using p x yPq\xy anc * ^* i s gi yen by (19). 

As mentioned above, every code has an ACI code 
embedded in it. We will show below that if a code 
satisfies (23) with D = D* of (19), then it must 



satisfy condition (10) on common information rate. More 
precisely, 

Claim 1: If a sequence of (N(n), iVi(n), N2(n), n) 
codes satisfy (23) with D = D* , then it must hold that 
for sufficiently large n, 

^H{gt\x n ,f[ n) {X n ,Y n ))) > I(X,Y;Q)-8(e), 

where 6(e) | as e | and the mutual information 
expression on the right-hand-side is evaluated using the 
joint distribution PxyPq\xv 

Proof of Claim 1: Suppose (23) holds with D = 

D*. Let Wi = g[ n \x n ,f[ n \x n ,Y n )). Then, 

fr(Wi) > I{Wi-X n Y n ) 

(a) 

> I(Q n ;X n Y n ) 

n 

= Y / i(Q n ;X t Y t \x' l - 1 Y 1 - 1 ) 

i=l 
n 

= Y J I{Q n X l ~ 1 Y l - 1 ;X l Y i ) 
1=1 

n 

Z^IiQilXiYi), (24) 
i=i 

where (a) is a data processing inequality. Before we 
proceed further, we state some simple properties of the 
rate-distortion function from lossy source coding: 

R(D)= min I(Q;X,Y). 

p QlXY :E[d(X,Y,Q)]<D 

R{D) is a continuous, convex, and non-increasing func- 
tion of D. A proof can be found, for instance, in [4]. 
Let 

D i =E[d(X i ,Y i ,Q i )]. 

Then 

R(D i )<I{Q l -X l Y i ). 
Substituting in (24), 



H(W ± ) > ^R(D t 



i=i 



(a) /iA \ 

>n(R(D*)-S(e)), 



(25) 



where 5(e) I as e I 0. (a) is Jensen's inequality, and 
(b) follows from the fact that the code satisfies (23) with 
D = D* and R(D) is a continuous and non-increasing 
function of D. 
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Let us recall that d and D* were provided by 
Lemma 3.4 which guarantees that 

R(D*) = I(X,Y;Q), 

where the mutual information is evaluated using the joint 
distribution PxyPqixv Substituting this into (25) and 
dividing by n, we get Claim 1. ■ 
Further, the conditions (21)-(22) on the rates and 
probability of error of a sequence of codes are identical 
to the conditions (8)-(9) for a valid ACI strategy. Hence, 
we may conclude from Lemma 3.5 that 

(7(Y; Q\X), I(X; Q\Y),I(X, Y; Q)) G TZ^{X; Y). 

To see this, for any e' > 0, notice that we may choose 
a small enough e > such that e' > min(e, 6(e)). 
Lemma 3.5 promises us an (N(n), Ni(n), N 2 {n), n) 
code such that (21)-(23) are met. This implies that (8)- 
(9) are met with e' . Moreover, Claim 1 implies that (10) 
is also met with e' . This completes the characterization 

of n Aa (x-Y). 

To complete the characterization of TZar\(X;Y), 
for e' > 0, let e > be chosen small enough 
such that e' > (3 + log|AT||y|)e + 5(e). Let us 
consider the (N(n), Ni(n), N 2 (n), n) code promised 
by Lemma 3.5 which satisfies (21)-(23) with R\ = 
I(Y;Q\X) R 2 = I(X;Q\Y), and D = D*. Let 
Wi = g[ n \x n ,f[ n) (X n ,Y n )). We have the following 
information theoretic identity (see (52) on page 21): 

I{X n - Y n \W!) = I{X n - Y n ) + I{X n - Wi\Y n ) 

+ I{Y n ;W 1 \X n ) - I(X n Y n ;Wi). (26) 

But, 

I{Y n ;Wi\X n ) = I(Y n ;g { "\x n ,f[ n \x n ,Y n ))\X n ) 



< I(Y n ;f[ n) (X n ;Y r 

< logTVi(n). 



\X n ) 



(27) 



Using (22) and following the same argument which lead 
us to (17), we can write 

I{X n ; Wi\Y n ) < log N 2 (n) + rine, (28) 
where k = 1 + log Further, by Claim 1, 



I(X n Y n ; Wi) = H{W 1 ) 

> n(I(X,Y;Q) 



5(e)). 



(29) 



Substituting the above three in (26) and using (21) with 
i?i = I(Y; Q\X) and R 2 = I(X; Q\Y), 

1 j( X n. Y n \W!) < I(X; Y) + I(Y; Q\X) + I(X; Q\Y) 



n 



- I(X,Y;Q) + ( K + 2)e + 5(e) 
I(X;Y\Q) + e', (30) 



where the last equality is again (52). Hence, we may 
conclude that 

(I(Y; Q\X), I(X; Q\Y),I(X; Y\Q)) G K AR \(X; Y). 

This completes the characterization of 7£ari- 

IV. The Gray-Wyner System and its 
Relationship to Region of Tension and 
Assisted Common Information 

A. Gray-Wyner system 




Fig. 6: Setup for Gray-Wyner (GW) system. 



The Gray-Wyner system is shown in Figure 6. It is a 
source coding problem where an encoder who observes 
the pair of correlated sources X n ,Y n maps it to three 
messages: two "private" messages Ma = f^\x n ,Y n ), 
M B = f£ l \x n ,Y n ), and a "common" message M c = 
f^\x n , Y n ). There are two decoders which attempt to 
recover X n and Y n respectively. The first decoder tries 
to estimate X n using the private message Ma and the 
commom message Mq as X n = g^(M A , Mq), and the 
second decoder tries to estimate Y n from Mb , Mq, as 
Y n = g^} (Mb, Mq). Gray-Wyner problem is to char- 
acterize the rates of the messages so that the decoders 
estimate losslessly. 

More precisely, for a pair of random 
variables (X,Y), an (N A ,N B ,N c ,n) GW code 
(f!C\fi n \& n) J&9g!), is such that ' 



An) 
J a 

(n) 

9kc 

(n) 

9bc 



X n x y n -> {1, . . . , N a }, where a = A, B, C, 
{l,...,7V A }x{l,...,AT c }^^, and 
{l,...,N B }x{l,...,N c }^y n 



are deterministic functions. We say that (i?A, Rb, Rc) is 
achievable in the Gray-Wyner system for (X, Y), if there 
is a sequence of (N A (n) , A^n) , Nq(ti) , n) GW codes 
(fA n \fB\fc ) >9{%9Bc) such that for every e > 0, 
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for large enough n 
1 



n 



log N a (n) <R a + e, q = A,B,C, 



Pv[g^(f^(X n ,Y n ),f^(X n ,Y n )) + X n )\ < e, 
PT[g^(i n \x n ,Y n ),i n) (X n ,Y n )) ± Y n )\ < e. 

Definition 4.1: The Gray-Wyner region 1Z G \n(X;Y) 
is the closure of the set of all rate 3-tuples that are 
achievable in the Gray-Wyner system for (X, Y). 

We write 1Z G w when the random variables are clear from 
the context. 

A simple bound on TZqw(X;Y) is given by 
K G w{X;Y) C C GW (X;Y), where 

£ GW (X; Y) 4 {(R A , R B , R c ) :R A + R C > H(X), 
R B + R C > H(Y), R a + R b + R c > H(X, Y)} (31) 



The Gray-Wyner region was characterized in [11]. 
Theorem 4.1 ([11]): 1Z GW (X;Y) equals 

({(H(X\Q),H(Y\Q),I(X, Y; Q)) : V q\xy & V X y} 



Wyner's common information [31], C\N yner (X]Y) of 
a pair of random variables X, Y is defined in terms 
of the Gray-Wyner system. It is the smallest Rq such 
that the outputs of the encoder taken together is an 
asymptotically efficient representation of (X,Y), i.e., 
when R A + R B + Rc = H(X,Y). Using the above 
theorem we have 



Theorem 4.2 ([31]): 

C\A/yner(^; Y) = 



inf R c 

(Ra,Rb,Rc)^ gw (X;Y), 

Ra+Rb+Rc=H(X,Y) 
min I(X,Y;Q) 

PQ\XY diPx,Y '■ 

X-Q-Y 



It is known that Gacs-Korner common information can 
be obtained from the Gray-Wyner region [6, Problem 
4.28, pg. 404]. 



C GK (X;Y)= max R c 

Ra+R c =H(X),R b +R c =H{Y), 
(Ra,Rb,Rc)£'R-gw 



(32) 



Alternatively [17], 

C G k(X;Y) 



max 

R<I(X;Y), 

{i?C = -^}ri-CGwC7^Gw 



R 



B. New Connections 

Analogous to Corollary 3.2, the following theorem 
(proved in the appendix) shows that the region of tension 
of (X, Y) can be expressed in terms of their Gray-Wyner 
region. 

Theorem 4.3: 

1(X;Y)=i(g x ^ Y (lZ GW (X;Y))), 

where gx,Y is an affine map defined as 









R A + R C - 


H(X) 


9X,Y ^ 


Rb 


y 


Rb + Rc- 


H(Y) 




. R c . 




_ R A + R B + Rc 


-H(X,Y) 



Thus, the tension region %(X; Y) is the increasing 
hull of the Gray-Wyner region 1Z G \^(X]Y) under an 
affme map gx,Y- ^he map, in fact, computes the gap 
of 1Z G w(X;Y) to the simple lower bound £ G w(X;Y) 
of (31). The first coordinate of 7£q W is the gap between 
the (sum) rate at which the first decoder in the Gray- 
Wyner system receives data and the minimum possible 
rate at which it may receive data so that it can losslessly 
reproduce X n . The second coordinate has a similar 
interpretation with respect to the second decoder. The 
third coordinate is the gap between the rate at which 
the encoder sends data and the minimum possible rate 
at which it may transmit to allow both decoders to 
losslessly reproduce their respective sources. 

Though Theorem 4.3 shows that the region of tension 
is closely related to the Gray-Wyner region, it must 
be noted that the latter does not possess an essential 
monotonicity property of the region of tension that is 
discussed in Section V, and is therefore less-suited for 
the cryptographic application which motivates this paper. 

The relations (32) and (33) fall out of Theorem 4.3 
and Corollary 3.3. 



Corollary 4.4: 
C GK (X;Y) = 

C GK (X;Y) = 



max R G 

R A +R C =H(X),R B +R C =H{Y), 
(Ra,Rb,Rc)£K gw 



max 

R<I(X;Y), 

{Rc=R}^£-gw < ~'R'Gw 



R 



(32) 
(33) 



Another consequence of Theorem 4.3 is an expression 
for Wyner's common information Cw yn er(^; Y) in terms 
of %(X; Y) (see Figure 1): 

Corollary 4.5: 



(33) C W> „,(X; Y) = m Y) + jjte Ik + to. 



(34) 
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As we have seen already, one of the axes intercepts 
of %(X;Y), namely T^ nt (X;Y) is closely connected to 
the GK common information (Cqk(^; Y) = I(X; Y) — 
T^ it (X; Y)). The other two axes intercepts also turn out 
to be closely connected to certain quantities identified 
elsewhere in the context of source coding [20], [17]. 
Before we look at this connection, let us reinterpret 
these two axes intercepts using the fact that T(X; Y) = 
TZ AR \(X;Y) (Corollary 3.2). 

In the context of the assisted common information 
system in Figure 4(b), T[ nt (X;Y) (resp., T$*(X;Y)) 
is the rate at which the genie must communicate when 
it has a link to only the user who receives X (resp. 
Y) source so that the users can produce a common 
random variable conditioned on which the sources are 
independent 7 . We have already seen in Theorem 2.2 that 

T[ n \X;Y)= min I{Y;Q\X), (35) 

PQ\XY&Vx,Y '■ 

I(X;Q\Y)=I(X;Y\Q)=0 

Tf\X;Y)= min I{X;Q\Y). (36) 

Pq\xy&Vx,y ■ 
I(Y;Q\X)=I(X;Y\Q)=0 

We will show below that this pair is closely related to 
a pair of quantities identified in the context of lossless 
coding with side-information [20] and the Gray-Wyner 
system [17]. Let (following the notation of [17]) 

G(Y ->X) = 

mm{R c : (H(X\Y), H(Y) - R c , R c ) G K GVJ (X; Y)}, 
G(X -»• Y) = 

mm{R c : (H(X) - R c , H(Y\X), R c ) G n GVJ (X; Y)}. 

It has been shown [20], [17] that G(Y ->■ X) is the 
smallest rate at which side-information Y may be coded 
and sent to a decoder which is interested in recovering 
X with asymptotically vanishing probability of error if 
the decoder receives X coded and sent at a rate of only 
H(X\Y) (which is the minimum possible rate which 
will allow such recovery). Further, [17] arrives at the 
maximum of G(Y — > X) and G(X — > Y) as a dual to 
the alternative definition of Cqk hi (33) from the Gray- 
Wyner system. 

We prove the following relationship between the two 
pairs of quantities in the appendix. 

7 Though the definition allows for zero-rate communication to the 
other user and a zero-rate (but non-zero) residual conditional mutual 
information, it can be shown from the expression for these rates in 
(35)-(36) that there is a scheme which achieves exact conditional 
independence and requires no communication to the other user. The 
proof is similar to that of Corollary 3.3. 



Corollary 4.6: 

G(Y -»• X) = I(X; Y) + T{ nt (X; Y), (37) 
G(X -»• Y) = I(X; Y) + T^\X; Y). (38) 

Further, 

min{i? : R > I(X;Y), 

(R c = R)n C GW (X; Y) C K GVJ (X; Y)} 
= max(G(Y -> X), G(X -> Y)) (39) 
= I{X- Y) + max(i?i_ , R2-0). (40) 



V. Upperbounds on the Efficiency of 
Two-Party Secure Sampling Protocols 

We will now apply the concept of tension to derive 
upperbounds on the efficiency of two-party secure sam- 
pling protocols. A two-party protocol IT is specified by a 
pair of (possibly randomized) functions 7TAii C c and 7TBob, 
that are used by each party to operate on its current 
state W to produce a message m (that is sent to the 
other party) and a new state W' for itself. The initial 
state of the parties may consist of correlated random 
variables (X,Y), with Alice's state being X and Bob's 
state being Y; such a pair is called a set up for the 
protocol. The protocol proceeds by the parties taking 
turns to apply their respective functions to their state, 
and sending the resulting message to the other party; this 
message is added to the state of the other party. 7rAii cc 
and 7TBob also specify when the protocol terminates and 
produces output (instead of producing the next message 
in the protocol). A protocol is considered valid only 
if both parties terminate in a finite number of rounds 
(with probability 1). The view of a party in an execution 
of the protocol is a random variable which is defined 
as the sequence of its states so far in the protocol 
execution. For a valid protocol II = (^AHce, ^Bob)* 
we shall denote the final views of the two parties as 
(n A \ e ™ c (X;y),n™™(X; Y)). Also, we shall denote the 
outputs as {U°^ ce {X;Y),U^X{X;Y)). (Later, when 
it is clear, we abbreviate these as (n A \^ e , ilg^) and 
( n Aiice' n Bob) respectively.) 

Now we define (perfectly) secure sampling. (Exten- 
sion to statistically secure sampling, which allows a 
vanishing error, is treated in Section V-D.) 

Definition 5.1: We say that a pair of correlated ran- 
dom variables (U, V) can be (perfectly) securely sampled 
using a pair of correlated random variables (X, Y) as set 
up if there exists a valid protocol IT = (7TAiice, ^Bob) such 
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that 

(UZUX; Y), UZ h (X; Y)) ~ V uy (41) 
n^ e (X; Y) - nZ cc (X; Y) - D§J b (X; Y) (42) 

n^ cc (x ; y) - iC b (x ; y) - n^(x ; y) (43) 

In this case we say U^ X ' Y '> (U, V). 

The three conditions above correspond to correctness 
(when neither party is corrupt), security for Bob when 
Alice is corrupt, and security for Alice when Bob is 
corrupt. The correctness condition in (41) is obvious: the 
outputs (n^ ce (X; Y), rrg^ b (X; Y)) must be identically 
distributed as (U, V). The condition in (42) says that 
even if Alice is "curious" (or "passively corrupt") and 
retains her view in the entire protocol, it should give 
her no more information about Bob's output than just 
her own output at the end of the protocol provides. (43) 
gives the symmetric condition for when Bob is curious. 

A. Towards Measuring Cryptographic Content 

As metioned in Section II, in [29] three information 
theoretic quantities were introduced, which we identified 
as the three axes intercepts of %(X;Y). As shown in 
[29], these quantities are "monotones" that can only 
decrease in a protocol, and if the protocol securely 
realizes a pair of correlated random variables (U, V) 
using a set up (X, Y), then each of these quantities 
should be at least as large for (X,Y) as for (U,V). 
Thus such a monotone can be thought of as a quantitative 
measure of cryptographic content in the sense that (U, V) 
with a higher cryptographic content cannot be generated 
from a set up (X, Y) with a lower cryptographic content. 

While the quantities in [29] do capture several in- 
teresting cryptographic properties, they paint a very 
incomplete picture. For instance, two pairs of correlated 
random variables (X,Y) and (X',Y r ) may have vastly 
different values for these quantities, even if they are 
statistically close to each other, and hence have similar 
"cryptographic content." 

Instead, we shall consider the three dimensional region 
T(X; y) and show that the region as a whole satisfies a 
mono tonicity property: the region can only expand (grow 
towards the origin) when (X, Y) evolve as the views 
of the two parties in a protocol (or outputs "securely 
derived" from the views in a protocol). Hence if the 
protocol securely realizes a pair of correlated random 
variables (U,V) using a set up (X,Y), then %(X;Y) 
should be contained within %(U;V). As we shall see, 
since the region 1(X; Y) has a non-trivial shape (see 
for instance, Example 2.2), %(X; Y) can yield much 
better bounds on the rate than just considering the 



axis intercepts; in particular %(X; Y) can differentiate 
between pairs of correlated random variables that have 
the same axis intercepts. Further T(X; Y) is continuous 
as a function of px,Y, and as such one can derive rate 
bounds that are applicable to statistical security as well 
as perfect security. 

B. Monotone Regions for 2-Party Secure Protocols 

Definition 5.2: We will call a function Ai that maps a 
pair of random variables X and Y, to an upward closed 
subset 8 of (points in the <i-dimensional real space 
with non-negative co-ordinates) a monotone region if it 
satisfies the following properties: 

1) (Local computation cannot shrink it.) For all jointly 
distributed random variables (X, Y, Z) with X — 
Y - Z, we have M(XY;Z) D M(Y;Z) and 
M{X-YZ) D M(X;Y). 

2) (Communication cannot shrink it.) For all 
jointly distributed random variables (X,Y) 
and functions / (over the support of X or 
Y), we have M(X;Yf(X)) D M(X;Y) and 
M(Xf(Y)-Y) D M(X;Y). 

3) (Securely derived outputs do not have smaller re- 
gions.) For all jointly distributed random variables 
(X, U, V, Y) with X - U - V and U - V - Y, we 
have M(U; V) D M(XU; YV). 

4) (Regions of independent pairs add up.) For inde- 
pendent pairs of jointly distributed random variables 
(X 1 ,Y 1 ) and (X 2 , Y 2 ), we have M(X l X 2 - Y ± Y 2 ) = 
M(X 1 ;Yi)+M(X 2 ; Y 2 ), where the + sign denotes 
Minkowski sum. In other words, M.(X\X 2 ; Y{Y 2 ) = 
{ai+a 2 | ai G M(X l ;Y l ) and a 2 G M(X 2 ;Y 2 )}. 
(Here addition denotes coordinate-wise addition.) 

Note that since M(X 1 ;Y 1 ) and M(X 2 ;Y 2 ) have 
non-negative co-ordinates and are upward closed, 
M(X 1 ; Yi) + M(X 2 ; Y 2 ) is smaller than both of them. 
This is consistent with the intuition that more crypto- 
graphic content (as would be the case with having more 
independent copies of the random variables) corresponds 
to a smaller region. 

Our definition of a monotone region strictly general- 
izes that suggested by [29]. The monotone in [29], which 
is a single real number m, can be interpreted as a one- 
dimensional region [m, oo) to fit our definition. (Note 
that a decrease in the value of m corresponds to the 
region [m, oo) enlarging.) 

8 A subset M of R d is called upward closed if a £ M and a' > a 
(i.e., each co-ordinate of a' is no less than that of a) implies that 
a' e M. 
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Theorem 5.1: If ri\ independent copies of a pair of 
correlated random variables (U, V) can be securely real- 
ized using ?22 independent copies of a pair of correlated 
random variables (X, Y) as set up, then for any mono- 
tone region M, n 2 M(X;Y) C mM(U;V). (Here 
multiplication by an integer n refers to n-times repeated 
Minkowski sum.) 

Proof: Consider some protocol IT such that 
n (X"2,Y"2) ^ (U n \V ni ). Let t be the maximum 
number of messages in the protocol. For i = 0, . . . , t, 
let (Xi,Yi) denote the views of the parties after the i th 
message. Then (X ,Y ) = (X n \Y n *) and (X t ,Y t ) = 
(n™^ c ,n™™). By Condition (1) and Condition (2) of 
Definition 5.2, M(X i+1 ;Y i+1 ) D M{Xi;Yi) (note that 
we do allow the local computation defined by 7r Alice 
and 7TBob to be randomized, but the randomness used is 
independent of the other party's view). By (41)-(43) as 
applied to n( x " 2 > y " 2 ) ^ (U ni ,V Ul ), and Condition (3), 
M (U^- V ^) = A^n^n^) D M(X t ;Y t ). Thus, 
M(U ni ;V ni ) D M(x n2 ;Y n2 ). Finally, by Condi- 
tion (4) we obtain the claimed inclusion. ■ 

C. Using Tension to Bound Rate of Secure Sampling. 

Theorem 5.1 gives us a means to use an appropriate 
monotone region to bound the rate of securely sampling 
instances of a pair (U,V) from a set up (X,Y). We 
define this rate as follows (where (X n ,Y n ) denotes n 
independent copies of (X, Y)). 

Definition 5.3: For pairs of correlated random vari- 
ables (U, V) and (X, Y) (i.e., p.m.f.s puv and pxy), the 
rate of securely sampling (U, V) from (X, Y) is defined 
as 

sup{^ : 3II,ni,n 2 s.t. U^ X " 2 ' Y ^ ~» (U n \V n ^)}. 
n 2 

Note that in Theorem 5.1, n-times repeated 
Minkowski sum of A4 is 

nM = {ai H h a n | ai, . . . , a„ G M}. 

In general, the shape of the n-times Minkowski sum of 
a region changes with n and would make it difficult to 
work with. But if M is convex, then this multiplication 
operation gives the same region as the following defini- 
tion of multiplication by a real number r > 0: 

r ■ M = {ra | a G M} (for convex M). (44) 

This gives us a convenient way to bound the rate, if 
we use a convex monotone region. The following is an 
immediate corollary of Theorem 5. 1 (and the fact that 

9 Here we let 111 = when m = ni = 0. 



for convex regions M\ and M 2 , n 2 M 2 C n\M\ iff 
M 2 c %Mi). 

Corollary 5.2: For any convex monotone region M., 
if the rate of securely sampling (U, V) from {X, Y) 
is r > 0, then M(X;Y) C r • M(U;V). (Here, 
multiplication of a region by a real number is as in (44).) 

The importance of the above corollary is that the 
region of tension provides us with a "good" convex 
monotone region, which can be used to obtain state-of- 
the-art bounds on the rate. 

Theorem 5.3: 1 is a (3-dimensional) monotone region 
(as in Definition 5.2). 

In fact, we shall show a more general result in Theo- 
rem 5.6, which implies the above theorem. Combined 
with the fact that T is convex (Theorem 2.3), Theo- 
rem 5.3 and Corollary 5.2 yield the following result 
(which will also be generalized in Corollary 5.7). 

Corollary 5.4: If the rate of securely sampling (U, V) 
from {X, Y) is r > 0, then %(X; Y) C r ■ 1(U; V). 

Note that this gives an upperbound on r, because, as r 
increases from 0, the region r ■ T(X; 1") shrinks away 
from the origin. 

In general, we can obtain tighter bounds this way 
than yielded by the three monotones considered in [29] 
(namely, the axis intercepts of this monotone region), 
because the region of tension can "bulge" towards the 
origin. In other words, the intercepts, and in particular 
the common information of Gacs and Korner, do not by 
themselves capture subtle characteristics of correlation 
that are reflected in the shape of the monotone region. 
Below, we give a concrete example where the region of 
tension does give us a tighter bound than the monotones 
of [29]. 

Example 5.1: Consider the question of securely re- 
alizing n\ independent pairs of random variables dis- 
tributed according to (U, V) in Example 2.2 from n 2 
independent pairs of (X,Y) in Example 2.1. While the 
monotones in [29] will give an upperbound of 1.930 on 
the rate n\/n2, we show that nxjui < 0.551. (For this 
we use the intersection of 1(U; V) with the plane z = 
(Figure 3) and one point in the region %(X; Y) (marked 
in Figure 2); then by Corollary 5.4, 0.1143 > 0.2075 • r. 
Note that we do not claim this is the tightest bound 
we can obtain from Corollary 5.4: we do not check if 
T(X; Y) C r ■ %(U ; V) for this value of r, since we 
do not compute the entire boundary of the two three- 
dimensional regions.) 
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D. Statistical Security 

Recall that the security conditions ((41)-(43)) for a 
protocol II sampling (U, V) from a set up (X, Y) relate 
^mL(X; Y),IL°g h (X; Y), II™™ (X; Y),U^(X; Y) 
with U, V and with each other. These conditions are 
for perfect security. A more realistic notion of security 
allows a small error in all these three conditions. Such 
a notion is referred to as statistical security. Below, 
we present a standard "simulation-based" definition 
of statistical security. (Below, we will abbreviate 
^Z cc (X;Y),U^ e (X;Y) etc. by H^H™™ etc., 
for the sake of readability.) 

Definition 5.4: For e > 0, a protocol II is said to e- 
securely sample a pair of correlated random variables 
(U, V) using a pair of correlated random variables 
(X, Y) as set up if there exists a valid protocol II = 
(tt Aiico 7i"Bob) and random variables ("simulated views") 
and Eg** over the alphabets of n™™ and 
^Bob respectively, distributed according to Ps™ w |C/,v 
and Py,™™\u,v such tnat 

Z$Z>-U-V and U-V-Y$g (45) 
A((U,V), (DSL,DS? b ) )<* (46) 
A( (X™,V) , (^^,^1) ) < e (47) 
A( , (nite>n B X) ) < e (48) 

Here A(-, ■) stands for the total variation distance. In this 

case we say n(*< y ) ^ (u, v). 

Remark: U^ X ' Y ^ <2> (U, V) if and only if T[( X > Y ) ^ 
(U,V) (Definition 5.1). In particular, it can be shown 

that if n( x ' y ) £ (U,V), then (42) and (43) hold 
(see for instance, Lemma D.l). In the other direction, 

if n( X ' y ) ~» (U,V), then one can take Ps™™ \u,v = 
PnjSK.insi^.nsa, and Pn^\uy = Pn^li^ng^- 

Definition 5.5: We say (£7, V) can be statistically 
securely sampled using a pair of correlated random 
variables (X,Y) as set up if, for any e > 0, there is 
a valid protocol II and positive integers rei, n 2 such that 
n(X "2,Y"2) ^ {u n \V ni ). Then, the rate of statistically 
securely sampling (U, V) from (X, Y) is defined as 

lim sup ( — : 311, m , n 2 s.t. n( x " 2 > y " 2 ) ^ (?7 ni , F" 1 ) 

Remark: The typical definition of security in cryp- 
tography literature requires the protocol IT to be uniform 
(i.e., the protocol for all values of e can be implemented 
by a single Turing Machine that takes e as input) and 
also "efficient" (i.e., the Turing Machine implementing 
the protocol runs in time (say) polynomial in log 1 / 6 )- 
Since we shall be proving negative results, using the 



weaker security definitions without these restrictions 
only strengthens our results. 

Robust Monotone Regions: We generalize the defi- 
nition of a monotone region (Definition 5.2) by strength- 
ening item (3) in the definition to the following con- 
ditions, to obtain the definition of a "robust monotone 
region." 

Definition 5.6: We will call a function M that maps 
a pair of random variables X and Y, to an upward 
closed subset of a robust-monotone region if it is a 
monotone region (as in Definition 5.2), and the following 
hold: 

3') (Statistically securely derived outputs do not have a 
much smaller region.) There exists a constant c > 
such that, for any jointly distributed random vari- 
ables (X,U,V,Y) and <j> > 0, if I(X;V\U) < 
and I(U-Y\V) < <f>, then 

M(U; V) D M(XU; YV) + af>. 

3") (Continuity, Convexity and Closure.) There exists a 
bounded, continuous function 5 : [0, 1] — > IR+ with 
5(0) = 0, such that for any two pairs of correlated 
random variables (X,Y) and (X',Y'), both over 
alphabet X x y, and e e [0, 1], if A(XY, X'Y') = e, 
then M(X;Y) C M{X';Y') - 5(e) ■ log |Af||3^|- 
Also, M(X;Y) is convex and closed. 

Note that condition (3) in Definition 5.2 is a restriction 
of condition (3') to the case = 0. 

In Appendix D we prove the following generalization 
of Corollary 5.2. 

Theorem 5.5: For any robust monotone region M., if 
the rate of statistically securely sampling (U, V) from 

(X, Y) is r > 0, then M(X; Y) C r ■ M(U; V). 

Also, we can generalize Theorem 5.3 as follows. 

Theorem 5.6: T is a (3 -dimensional) robust monotone 
region (as in Definition 5.6). 

Proof: We verify the four properties of a robust 
monotone region (see Definition 5.2 and Definition 5.6). 

1) Local computation cannot shrink it: For all random 
variables with X — Y — Z, we need to show that 
%(X; YZ) D %(X; Y) and 1(XY; Z) D %(X; Y). 
The first inclusion follows from the fact that for the joint 
p.m.f. pxyzq = PxyPz\yPq\xy> we have 

I(X;YZ\Q) = I(X;Y\Q) 
I(Q;YZ\X) = I(Q;Y\X) 
I(X;Q\YZ) = I(X;Q\Y). 

2) Communication cannot shrink it: For all random 
variables (X, Y) and functions / over the support of 
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X (resp, Y), we have to show that %{X; (YJ(X))) D 
%(X- Y) (resp, %{{X, f{Y));Y) 5 %(X; Y)). 
The first set inclusion follows from the following facts 
for the joint p.m.f pxyzq = PxyPz\yPq\xy- 

I(X; Y, f(X)\Q, f(X)) = I(X; Y\Q, f(X)) 

<I(X-Y\Q) 

I(X; Q, f(X)\Y, f(X)) = I(X; Q\Y, f(X)) 

<I{X-Q\Y) 
I(Y;QJ(X)\X) = I(Y;Q\X). 

3') Statistically securely derived outputs do not have 
a much smaller region: We let c = 1. Suppose 
I(X; V\U)<<j> and I(U; Y\V) < (f>. We shall show that 
1(U; V) D 1(XU; VY) + 0. For this, it is enough to 
show that, for any vq\xvvy G Vxuvy, T(U;V\Q) < 
T(XU ; VY\Q)+(f> (where the comparison is coordinate- 
wise and the addition applies to each coordinate). This 
is easy to see for the last coordinate since I(U ;V\Q) < 
I(XU;VY\Q) < I(XU;VY\Q) + 0. For the second 
coordinate, note that 

I(XU;Q\VY) > I(U;Q\VY) 

= I(U;QY\V)-I(U;Y\V) 
>I{U-Q\V)-I{U-Y\V). 

Since I(U;Y\V) < <j>, we have I(U;Q\V) < 
I(XU;Q\VY) + <j). Similarly, I(V;Q\U) < 
I(VY;Q\XU) + cb. 

3") Continuity follows from Theorem 2.5, with 
5(e) = 2H2(e) + e (so that 5(e) in Theorem 2.5 is upper- 
bounded by 5(e) log |Af||3^|). Convexity and closure fol- 
low from Theorem 2.3 and Theorem 2.4 respectively. 

4) Regions of independent pairs add up: If (X\ , Y\ ) 
is independent of (X 2 ,Y 2 ), we have to show that 
T((XiJT 2 ); (Y±Y 2 )) = T(Xi;yi) + %(X 2 ;Y 2 ). This 
follows easily from the following facts: 
For the joint p.m.f. Px^Px^Pq^x^Pq^x^ we 
have 

/(XiX 2 ; yiy 2 |QiQ 2 ) = /(Xi;yi|Qi) + I(X 2 Y 2 \Q 2 ) 
I(X 1 X 2 ;Q 1 Q 2 \Y 1 Y 2 ) = I(X 1 ;Q 1 \Y 1 )+I(X 2 ;Q 2 \Y 2 ) 
I(YiY 2 ; QMX^) = /(y i; Qx|Xi) + I(Y 2 ; Q 2 \X 2 ) 

And, for the joint p.m.f. Px 1 y 1 Px 2 y 2 Pq\x 1 y 1 x 2 y 2 , we 
have 

I(X 1 X 2 ;Y 1 Y 2 \Q)>I(X 1 ;Y 1 \Q) + I(X 2 ;Y 2 \Q) 
/(XiX 2 ; Q\Y ± Y 2 ) > I(X ± ; Q\Y ± ) + I(X 2 ; Q\Y 2 ) 
I{Y X Y 2 - Q\X X X 2 ) > 7(y i; Q\X{) + I(Y 2 ; Q\X 2 ) 



Theorem 5.5 and Theorem 5.6 together yield a gener- 
alization of Corollary 5.4. 

Corollary 5. 7: If the rate of statistically securely sam- 
pling (U, V) from (X, Y) is r > 0, then 1(X; Y) C 
r-1(U;V). 

E. Bounding the Rate of Bit-OT from String-OT 

Example 5.1 was contrived to highlight the shortcom- 
ings of prior work. We now give a another example 
where the upperbound from our result strictly improves 
on prior work, but is further interesting for two reasons: 
firstly, the new example is based on natural corre- 
lated random variables that are widely studied (namely, 
variants of oblivious transfer), and secondly, the new 
upperbound we can prove actually matches an easy 
lowerbound and is therefore tight. 

a) Bit-Oblivious Transfer and String-Oblivious 
Transfer: Oblivious Transfer, or OT [24], [25] is a pair 
of correlated random variables with great cryptographic 
significance. There are several variants of OT that have 
been considered in the literature. In particular, "bit-OT" 
corresponds to the following correlated pair of random 
variables: A = (S u S 2 ) and B = (C, S c )) where S u S 2 
are two i.i.d. uniformly random bits and the "choice 
bit" C is independent of (Si, S 2 ) and takes a uniformly 
random values in {1, 2}. Informally, in bit-OT, one of the 
two bits that Alice gets is transferred to Bob, but Alice 
is oblivious to which one was chosen to be transferred. 

It is well-known that qualitatively, the different forms 
of OT are all equivalent, in the sense that pairs of one 
variant can be securely sampled using pairs of another 
variant as set up (see for instance, [19]). However, the 
rate at which this can be done has not been studied well. 
That these rates are non-zero follows from a recent result 
in [16]. We are interested in upperbounding this rate (and 
indeed, when possible, calculating it exactly). 

Consider the rate of sampling bit-OT from a general- 
ization of bit-OT called "string-OT" where Alice receives 
two L-bit strings Si , S 2 instead of two bits (and one 
of those strings is obliviously transmitted to Bob). It is 
not hard to see that the rate of sampling bit-OT from 
string-OT is 1, intuitively because a single instance of 
string-OT provides only one bit C that is hidden from 
Alice. (In terms of the monotones, the axis intercept 
T{ nt (A; B) = (1, 0, 0) for string-OT, independent of the 
length of the strings.) But what if we consider two string- 
OTs together, one in each direction? In this case, there 
are L bits with Bob that are hidden from Alice, and vice 
versa. We ask if we can sample OT from this set up at 
a rate larger than 1 (in particular, linear in L). 
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Formally, we consider the set up (X, Y) and target 
random variables (U, V) as denned below. 

Let Sa a ,S a ,2,Sb,i,S b ,2 G {0, 1} l and C A ,C B G 
{1,2} be six independent random variables all of 
which are uniformly distributed over their alpha- 
bets. Consider a pair of random variables X, Y de- 
fined as X = (Ca, Sa,i, Sa,2, Sb,c a ) and Y = 
(C B , S b ,i, S b ,2, S a ,c b )- (Note that (S A ,i, S A ,2, C A ) and 
(Sb,i,Sb,2,Cb) correspond to the two instances of L- 
bit string-OT, one in each direction.) Let U, V be a pair 
of random variables whose joint distribution is the same 
as that of X, Y, but with L = 1. In other words, U, V 
are a pair of independent bit-OT's in opposite directions. 

It is easy to see that %(X; Y) intersects the coordinate 
axes at (1 + L, 0, 0), (0, 1 + L, 0), and (0, 0, 2L). From, 
these we can immediately obtain the upperbound of [29] 
on the efficiency, namely (1 + L)/2. Notice that this is 
dependent on L and would suggest that (several) long 
string-OT pairs can be turned into several (more) bit- 
OT pairs. However, as we show below, the efficiency of 
conversion is just 1, i.e., the best one can do is to turn 
each pair of string-OT's into a pair of bit-OT's. 

To see this we need to consider a point on %{X\ Y) 
other than the three axis intercepts. By setting Q = 
(C a ,C b ,S a , Cb ,S b , Ca ) we get T(X;Y\Q) = (1,1,0); 
that is, %(X;Y) contains a point (1,1,0) independent 
of L. This already bounds the rate of sampling (U, V) 
from (X, Y) as set up, by some constant. To show 
that this constant is 1, we shall show that (1,1,0) 
occurs on the boundary of %(U; V). Then it follows 
from Corollary 5.7 that the rate of (statistically) secure 
sampling is upperbounded by 1. 

To show that (1,1,0) occurs on the boundary of 
%{U;V), we show that inf{i?i + R 2 : {Ri,R 2 ,0) G 
%(U;V)} = 2. Since 1(U;V) is a monotone region 
(Theorem 5.3), by property (4) of Definition 5.2, the 
regions of independent pairs add up, Hence, we need 
only characterize the inf{i?i + R 2 : (i?i,i?2,0) G 
%(A; B)}, where (A, B) is a single pair of independent 
bit-OT's: A = (5i,5 2 ) G {0, l} 2 uniformly distributed 
over its alphabet and B = (C,Sc), where C G {1,2} 
is independent of A and uniformly distrbuted over its 
alphabet. 



inf^i + R 2 : (R u R 2 ,0) G %(A; B)} 

inf I(B;Q\A) + I(A;Q\B) 

PqIAb&Vx,y:I(A;B\Q)=0 

= H(A\B) + H(B\A) 

sup H{A\QB) + H(B\QA). 

Vq\ab£V:I{A;B\Q)=0 

We show below that the sup term is 1. Since H(A\B) + 
H(B\A) = 2, this will allow us to conclude that the 



smallest sum-rate R\ + R 2 such that (Ri,R 2 ,0) G 
%(A; B) is 1. Invoking the lemma above, the correspond- 
ing smallest sum-rate for U, V is then 2 as required. 

To show that the sup term is 1, notice that the only 
valid choices of Pq\ AB are such that I(A;B\Q) = 0. 
This means that the resulting Pab\q{'i -\q) must belong 
to one of eight possible classes shown in Figure 7b (for 
any q with non-zero probability PQ(q); we may assume 
that all q's have non-zero probability without loss of 
generality). Recall that there is a cardinality bound on 
Q; let us denote the alphabet of Q by {qi, q 2 , . . . , qjy}, 
where N is the cardinality bound. 

We will first show that there is no loss of generality in 
assuming that no more than one of the qi 's is such that its 
Pab\q(-i -lit) belongs to the same class (and hence we 
may take N = 8). Suppose, q\ and q 2 belong to the same 
class, say class 1, with parameters p\ and p 2 respectively. 
Then, if we denote the binary entropy function by H 2 (.), 
we have 

H(A\QB) +H{B\QA) 

N 

= £j>Q(?fc) (H(A\BQ = q k ) + H(B\AQ = q k )) 

k=l 

= PQ(qi)H 2 (pi) + PQ{q2)H 2 {p 2 ) 
N 

+ Y.PQ^) (H(A\BQ = q k ) + H(B\AQ = q k )) 

k=3 

< (PQ{qi) +PQ(q2))H 2 1 

N 



pq(qi) + Pq(Q2) 
+ ^2pQ(Qk) (H(A\BQ = q k ) + H(B\AQ = q k )) , 



k=3 



where the inequality (Jensen's) follows from the concav- 
ity of the binary entropy function. Thus, we can define 
a Q' of alphabet size N — 1 where letters q\ , q 2 are 
replaced by q such that pQ>(qo) = PQ{q\)+PQ{q2), and 
PAB\Q'= go is in class 1 with parameter B^^jfe , 
while maintaining for i = 3, . . . , N, PQ>(qi) = PQ{qi) 
and p AB \Q'{a,b\qi) = PAB\Q(a,b\qi). (It is easy to 
verify (a) that this gives a valid joint p.m.f. for p AB Q', 
(b) that the induced p AB is the same as the original, 
and (c) that the induced Pq>\ AB satisfies the condition 
I(A;B\Q') = 0.) Then, the above inequality states that 

H{A\QB) + H(B\QA) < H(AQ'B) + H{B\Q'A) 

proving our claim. 

Thus, without loss of generality, we may assume that 
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1 -p 2 





PAB\Q(;-\<lii) 





PAB\<A;-\<lm) 



1 --P- 





(a) 



^sie(-»-l? v ) 



PAB\Q(;-\q,i) 



^sie(-,-l?vii) 



(b) 



^sie(-»-l?viii) 



Fig. 7: (a) Joint p.m.f. of A, B. Each solid line represents a probablity mass of 1/8. (b) Eight possible classes that 
Pab\q(-, -\q) ma y belong to for a Pq\ab which satisfies I(A;B\Q) = 0. 



N = 



8 and Pab\q('i ~\Qi) belongs to class i. Notice that 



00, 10) = 1, 

01.10) = 1, 
01,21) = 1, 
11,21) = 1, 

11.11) = 1, 
10,11) = 1, 
10,20) = 1, 
00,20) = 1. 



Pq\ab(Qi\00, 10) + Pq\ab(Q5\ 

PQ\AB(Q2\01, 10) +PQ\AB(q5\ 

Pq\ab(Q2\01,21) +Pq\ab{q%\ 
Pq|as(93|H,21) +Pq\ab{q&\ 
Pq\ab{Q3, 11, 11) + Pq\ab{<17\ 
Pq|Ab(<74|10, 11) +Pq\ab(Q7\ 
Pq|Ab(94|10,20) +Pq\ab(Q8 
Pq|ab(<7i|00,20) +Pq\ab(Q8\ 
Let us define 

Pi - Pq\ab(qi\00, 10), p 5 = Pq|ab(?5|01, 10), 
P2 = Pq\ab(Q2\01,21), pq = p QlAB (q 6 \ll,21), 

P3 ~ PQ\Ab(Q3\11, 11), P7 = PQ|Ab(^7|10, 11), 
P4 = Pq\ab((14\W,20), PS = PQ|Ab(<?8|00,20). 

Let us evaluate H(B\QA) in terms of the above 
parameters. Notice that H(B\Q = qi,A) = for 
i = 5, . . . , 8. Hence 

H{B\QA) 

PQA{q, a)H(B\Q = q,A = a) 



£ 

(i 

Pi + (1 - PS, 



(« J ,a)e{(l,00),(2,01), 
(3,H), (4,10)} 



Ho 



Pi 



Pi + (1 - Ps) 



P2 + (lHjjs) 

8 2 

P3 + (1 -pe) „ 
8 2 

p 4 + (l-P7) Er 

H n ""2 



P2 


P2 + 


(1- 


P5) 




P3 




P3 + 


(1- 


Pe) 




P4 





p" 4 + (1 -p» 



< 



4 + Ef=iPi-E?= 5 £? 



where the inequality follows from the fact that binary 
entropy function is upperbounded by 1 . Similary, we can 
get 



H(A\QB) < 



4 + EL 5 Pi - Ei=iPi 



Combining, we obtain, as desired, 

H(B\QA) +H{A\QB) < 1. 

Remark: Note that we have actually shown that for 
bit-OT (A,B), the intersection of %(A; B) on the plane 
z = is the increasing hull of the line segment between 
(1, 0, 0) and (0, 1, 0). This follows from what we showed 
above (i.e., inf{i?i + R 2 : (Ri,R 2 , 0) G 1(A; B)} = 1) 
combined with the fact that T[ Dt (A;B) = (1,0,0) and 
Tf\A; B) = (0, 1, 0), and that 1(A; B) is convex. 
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APPENDIX A 

Details Omitted from Section II 

Lemma A. 1: Given a pair of random variables 
(X,Y) and a p.m.f. p Q \ X Y sucn that I(Y;Q\X) = 
I(X;Q\Y) = 0, there exists a p.m.f. Pq>\xy sucri that 
H{Q'\X) = H(Q'\Y) = and Q - Q' - XY . 

Proof: Suppose Vq\xy * s sucn that I(Y; Q\X) = 
I(X;Q\Y) = 0. Then 

PQ\xY(q\x,y) =PQ\x(q\x) = VQ\Y{q\y)- 

Hence, for all (x,y) such that pxy(x,v) > 0, we must 
have Vg, Vq\x{q\x) = Pq[y(?|s/). This implies that, in 
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The following simple information theoretic identities for three jointly distributed random variables X, Y, Q are used 
at several places in this paper. 

I(Y; Q\X) = I(XY; Q) - I(X; Q) = H(X\Q) + I(XY; Q) - H(X), (49) 

I(X; Q\Y) = I(XY; Q) - I(Y; Q) = H(Y\Q) + I(XY; Q) - H(Y), (50) 

I(X; Y\Q) = H(X\Q) + H(Y\Q) - H(XY\Q) = H(X\Q) + H(Y\Q) + I(XY; Q) - H(XY), (51) 

I(X; Y\Q) = I(X; Y) + I(Y; Q\X) + I(X; Q\Y) - I(XY; Q).. (52) 

The first three equalities are easy to follow. The last one can be obtained by subtracting the first two from the third. 



the characteristic bipartite graph (which has vertices in 
X uy and an edge between x G X and y G y if and 
only if pxy{x,u) > 0), for each connected component 
C C X U y, there is a distribution Pq such that for all 
x G C n X and all q, Pq\x(q\x) = similarly, for 

all y e C n y and all q, p Q \ Y {q\y) = Pq{q)- Define 
Pqi\xy over tne set °f connected components in this 
graph such that, with probability 1 Q' is the connected 
component C(X, Y) in this graph to which the vertices 
X and Y belong (and hence H(Q'\X) = H(Q'\Y) = 
0), and p Q \Q,(q\C) = p%{q). Then p Q \ X Y(q\x,y) = 

PQ\x(q\x) = PQ (x ' y \q) = p Q \Q>{q\C{x,y)), so that 
Q-Q'-XY. m 

The following calculation is useful in applying the 
above lemma in a couple of our proofs. 

Lemma A.2: For correlated random variables 

(X,Y,Q,Q r ) if H{Q'\X) = H(Q'\Y) = and 
Q-Q'- XY, then I(X; Y\Q') < I(X; Y\Q). 

Proof: Note that (52) gives 

I(X;Y\Q) = I(X;Y)-I(XY;Q) 

+ I{Y-Q\X) + I{X-Q\Y). 

Since, H(Q'\X) = H{Q'\Y) = 0, we have 

I(X;Y\Q') = I(X;Y)-I(XY;Q'). 

Also, I{XY;Q) < I(XY;QQ') = I(XY;Q') where 
we used the fact that Q — Q' — XY and hence 
I{XY;QQ') = I(XY;Q') + I(XY;Q\Q') = 
I{XY;Q'). Thus I(X;Y\Q) - I(X;Y\Q') = 
I{Y-Q\X) + I(X-Q\Y)-I{XY-Q)+I{XY-Q') > 0. 

■ 

Proof of Theorem 2.2: To prove (5), firstly note 
that T{ nt {X;Y) = 

inf I(Y;Q\X)< inf H(Q\X), 

Pq\xy: 1 Pq\xy- 1 

I(X;Q\Y)=0 H(Q\Y)=0 
I(X;Y\Q)=0 I(X;Y\Q)=0 

because if H(Q\Y) = then I(X;Q\Y) = and 
I(Y;Q\X) = H{Q\X). For the other direction, we 



invoke Lemma A.l (with X and Q interchanged), so that 
given Q such that I(X;Q\Y) = I(X;Y\Q) = 0, 3Q' 
such that H(Q'\Y) = H(Q'\Q) = and X - Q' - QY; 
then H(Q'\X) = I(Y;Q'\X) < I(Y;Q\X), and 
X — Q' — Y. So Q' is considered in the inf expression 
of the RHS, and we have LHS > RHS. This proves (5). 
Similarly, (6) holds. 

To prove (7), firstly we note that T^ nt (X; Y) = 

inf. I(X;Y\Q)< inf. I(X;Y\Q), 

Pq\xy- Pq\xy- 
I(Y;Q\X)=0 H(Q\X)=0 
I(X;Q\Y)=0 H(Q\Y)=0 

since H{Q\X) = H{Q\Y) = implies that 
I(Y;Q\X) = I(X;Q\Y) = 0. For the inequality 
in the other direction, by Lemma A.l, given Q such 
that I{Y- Q\X) = I(X; Q\Y) = 0, we get Q' such that 
H(Q'\X) = H{Q'\Y) = and Q - Q' - XY; then, by 
Lemma A.2 it follows that I(X;Y\Q) > I(X;Y\Q'). 

Hence, i^ PQ]XY :i(Y;Q\x)=i(X;Q\Y)=o Y\Q) > 
™t pQ]XY:H (Q\x)=H{Q\Y)=oI{X;Y\Q). Thus, (7) holds. 

■ 

Proof of Theorem 2.3: Consider any two points 
si,S2 G %(X;Y). Consider any point s = as\ + (1 — 
a)s 2 for < a < 1. We need to show that s G 1(X; Y) 
as well. 

Since s\,S2 G %{X\Y), there are random variables 
Pq 1 \xy and Pq 2 \ X y such that s[ := T(X;Y\Q 1 ) < Sl 
and s' 2 := T(X; Y\Q 2 ) < s 2 . Let J be a binary random 
variable independent of (X, Y, Q±, Q 2 ) taking on value 
1 with probability a and 2 with probability 1 — a. Let 
Q = (J, Qj). Then T(X; Y\Q) = aT{X; Y\Q{) + (1 - 
a)T(X;Y\Q 2 ). That is, s' = as[ + (1 - a)s' 2 is in 
1(X; Y). Hence s G 1(X; Y), since s> s'. ■ 

Lemma A. 3: If A C R m is compact, then its increas- 
ing hull, 

i (A) = {x G M m : x > a for some a G ,4}, 
is closed. 

Proof: Our proof of this simple fact is along the 
lines of [21, Proposition 4, pg. 44]. Consider an arbitrary 
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point a G R m — i (A). We will show that there exists a 
neighbourhood V of a such that V Hi (A) = 0. 

For any point x £ A, we have a ^ x (coordinate- 
wise); i.e., for some j G {1, . . . , m}, Oj < x.,-. Let £ = 
Xj — a,j. Let V x = {a' : \ \af — a\\ < 1/3} and W x = {x' : 
\\x' — x\\ < 1/3} be neighbourhoods around a and x. 
Then V x fli {W x ) = (for any a' G ^ and x" G i (W x ), 
we have a' / x" , because, a!- < aj+£/3 < xj — £/3, but 
since x" > x' for some x' G W x , x" > x'j > Xj —1/3). 

Since {W x : x G A} is an open cover of A, A being 
compact implies that there is a finite n and xi,...,x n 
such that 

n 

fc=l 

which in turn implies that 

n 

i(A)C \Ji(W Xk ). 

k=l 

Let 

n 

v=f)v Xk . 

k=l 

Clearly, V is a neighbourhood of a and we have 

n 

vm(A) = \J vm(w Xk ) = 0. 
fc=i 

Hence i (A) is closed. ■ 
The following simple (and standard) observation is 
used in proving Lemma 2.6. 

Lemma A.4: If pz and pz> are such that A(Z,Z') = 
e, then there is a joint distribution pjww such that 
Pw = Pz, Pw> = Pz>, pj(0) = e and pj(l) = 1 - e 
and J = 1 W = W. 

Proof: First we define independent random vari- 
ables J, Wo, W\ and W% (the first one over {0, 1} and 
the others over the common alphabet of Z and Z 1 as 
follows. 

pj(Q) = e, and pj{\) = 1 - e, 

min{p z (2),p Z '(^)} 
PWo(^) = j^T^ , 

pwA z ) = 1 

/ \ Pz'(z) - (1 - e)-p Wo {z) 
Pw 2 ( z ) = • 

We define W and W' in terms of these random variables: 
when J = 1, W = W = Wq, and when J = we set 
W = W\ and W' = W^. It is easy to verify that the 
resulting random variables have the correct marginals. 



Lemma 2.6: Suppose random variables (A, B, C) 
and (A', B' , C) over the same alphabet A x B x C are 
such that A(ABC,A'B'C) = e. Then I(A';B'\C) < 
I (A; B\C) + 2H 2 (e) + elogmin{|^|, \B\}. 

Proof: We apply Lemma A.4 with Z = (A, B, C) 
and Z 1 = (A',B',C) to obtain a joint distribution 
Pj,a,b,c,A',B',c so that J = 1 =^ (A,B,C) = 
(A', B',C') and this event occurs with probability 1 — e. 
Now, note that 

I(A; B\C) = I (A; BJ\C) - I (A; J\BC) 

= I (A; B\CJ) + I(A; J\C) - I (A; J\BC). 

Since < I(A; J\C) < H(J) and < I(A; J\BC) < 
H(J), we have 

\I(A; B\C) - I (A; B\CJ)\ < H(J) = H 2 (e) (53) 

The same condition holds for A',B',C instead of 
A,B,C. Hence 

I(A'; B'\C) < I (A'; B'\C'J) + H 2 (e) 
= (l-e)I(A';B'\C',J = l) 

+ eI(A';B'\C, J = 0) + H 2 (e) 
= {l-e)I(A;B\C, J= 1) 

+ eI(A';B'\C, J = 0) + H 2 (e) 
= I (A; B\CJ) - el (A; B\C,J = 0) 

+ eI{A'-B'\C\ J = 0) + H 2 (e) 

(a) 

< I(A; B\C) + eI(A'; B'\C' , J = 0) + 2H 2 (e) 

< /(A;S|C) + emiii{log|^|,log|B|} + 2fl- 2 (e), 
where (a) follows from (53). ■ 

Appendix B 
Details Omitted from Section III 

Proof of Corollary 3.2: The first equation (12) 
follows immediately from Theorem 3.1. We need to 
show (13) which is repeated below for convenience. 



1(X;Y) = i(f XjY (n AC] (X;Y))) (13) 
where fx,Y is an affine map defined as 





" Ri ' 




Ri 


fx,Y ^ 


R2 


y 


R2 




Rc\ 




I(X; Y) + R 1 + R 2 - R a 



Given a Pq\xy an d (^i)^2,^ci) such that r\ > 
I(Y;Q\X), r 2 > I{X;Q\Y) and r C \ < I(XY;Q), we 
have 

n +r 2 -r C i +I(X;Y) 

> I(Y; Q\X) + I(X; Q\Y) - I(XY; Q) + I(X; Y) 
= I(X;Y\Q), 
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where the last equality is (52). Thus, L.H.S. 5 R.H.S. 

If (V^r^, 73) G T(X;y), then there is a Pq|xy 
such that ri > /(Y;Q|X), r 2 > 7(X;Q|y) 
and rg > But, since (52) implies 

that " (I(Y;Q\X),I(X;Q\Y),I(X;Y\Q)) ' G 
/x,y(ft AC ,(X;y)), we have (r^r^) G 
z(/x,y(ft AC |(X ; y))). Thus, L.H.S. C R.H.S. 

■ 

Proof of Corollary 3.3: 
From the definitions it is clear that, Cqk(X;Y) < 
T^AC^iX; Y). But as we will show, this is in fact an 
equality. Theorem 3.1 implies that 



n ACi f(X;Y) 



max 

Pq\xy ■ 
I(X;Q\Y)=I(Y;Q\X)=0 



I(XY;Q). (54) 



By Lemma A.l, given Vq\xy sucn that I(X;Q\Y) = 
I(Y;Q\X) = 0, we can find a random variable Q' 
with H(Q'\X) = H(Q'\Y) = and Q - Q' - 
(X, Y) is a Markov chain. Then, clearly, I(X; Q'\Y) = 
I(Y;Q'\X) = and furthermore 

I(XY;Q) < I(XY-QQ') = I(XY;Q') = H(Q'). 

Hence, 

K AC :?(X;Y) 



max H(Q'). 

p Ql]XY :H{Q'\X)=H{Q'\Y)=Q 



Since H{Q'\X) = H{Q'\Y) = 0, Q' = h{X) 
and Q' = f2(Y) for some functions fi and f 2 , 
and hence C GK (X;Y) > H(Q'). So, C G k(X;Y) > 
TZ^fiX-Y). Hence, we can conclude (14)-(15). 
It only remains to show 

T™\X; Y) = I(X; Y) - Tl^\X- Y). (55) 

This easily follows from (4) and (54) using (49)-(51). 

■ 

Proof of Lemma 3.5: 
We are given p XY > Pq\xv d- Also, we have 



D* 



Er. 



[d(X,Y,Q)\. 



^Px,yPq\xy 

This proof uses the notion of typicality. We will use 
notation, definitions, and results from [8]. All typical 
sequences are defined with respect to the joint distribu- 
tion p x yPq\xy- F° r a positive integer k, we will denote 
by [k\. 

Random codebook construction: Let e' > and pq (q) = 

Y,x, y Px,Y{ x ^y)P*Q\XY( ( l\ x ^y) be tne marginal distribu- 
tion of Q induced by the given joint distribution. Let 
r,ri,r 2 be such that r > ri,r 2 . We generate 2 nr code- 
words Q n (l),l G [2 nr ] randomly and independently each 
according to ]Xl = iPQ{qi)- The set of indices I G [2 nr ] 
is then partitioned in two different ways into equal 



size subsets: 1-bins B\(mi) = {(mi — l)2 n(r ri ) + 
1, . . . , mi2< r ~ r ^}, mi G [2 nri ], and 2-bins B 2 {m 2 ) = 
{{m 2 - l)2"( r - r2 ) + 1, . . . , m 2 2 n( - r - r ^},m 2 G [2™ r2 ]. 
Encoding: If the input to the encoder is (x n ,y n ), it finds 
an index I such that (x n , y n , q n (I)) G 7~j n) (X,Y,Q). 
If none is available, I is chosen uniformly at random 
from [2 nr ]. The encoder sends to the k-th receiver, 
k = 1,2, the bin index m k such that I G Bk{m k ), i.e., 
f ( k n \x n ,y n )=m k ,k = l,2. 

Decoding: The first decoder, on receiving mi, tries to 
find a unique li G 61 (mi) such that (x n ,q n (li)) G 
T^ n \x, Q). If it cannot find such an li, it sets li = 1. 
Decoder 1 outputs li, i.e., g[ n \x n , mi) = l\. Similarly, 
decoder 2 outputs a l 2 it finds using y n ,m 2 , and B 2 . 
Reconstruction: The reconstruction function is de- 
fined as h( n \l) = q n {l). Thus the output sequence is 

<f = h^(ii) = q n {ii). 

Analysis of the probability of error and expected distor- 
tion: Let L, Mi,M 2 , Li,L 2 be the indices chosen by the 
encoder and the decoder. We define the error event as 



[li £ L 2 ) U [(X n X r \Q n {Li)) £ Tj n) (X,Y,Q)} . 
Let 

So = {(X n , Y n , Q n {l)) i rj n) for all I G [2 nr ]} , 
81 = {{X n ,Q n {h)) G rj n) for some 



h G Bi(Mi), li 7^ L}, 



S 2 = {(Y n ,Q n {l 2 )) G Tj n) for some 

! 2 eB 2 (M 2 )J 2 ^L}. 

Since the error event occurs only when 
(X n ,Y n ,Q n (L)) £ Tj n) or at least one of L x 
and L 2 is different from L, we have 

S C £ Ll£iUS 2 . 

By union bound, 

Pr(£) <Pr(£: ) + Pr(£: 1 ) + Pr(£: 2 ). 

By covering lemma [8, Lemma 3.3], Pr(£o) — > as 
n -)• 00 provided r > I(X, Y; Q)+5(e'), where 5(e') 1 
as e' I 0. To upperbound Pr(£i), we claim that 

Pr(fi) < 

Pr [[{X n , Q n {h)) G T e {n) for some h G 6i(l)}) . 

For a proof see [8, Lemma 11.1, pg. 284]. For each li G 
Bi(l), the codeword Q n (h) is generated independent 
of X n and according to ri7=i (•?*)• ^ ote tnat there 
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(56) 



are 2 n(r_ri ) codewords in £>i(l). By packing lemma [8, 
Lemma 3.1], the probability term on the R.H.S. above 
tends to zero as n — > oo provided r — n < I(X; Q) — 
5(e'). Similarly, Pr(£2) — > as n — > oo if r — r 2 < 
I(Y; Q) — S(e'). Combining the conditions for all three 
events, we have Pr(£) — > as n — > oo provided 

r 1 >I(Y;Q\X) + 2S(e'), 
r 2 >I(X;Q\Y) + 25(e'). 

We have shown that, when (56) hold, the ensemble aver- 
age of Pr(£) over (2 nr , 2™* 1 , 2 nr2 , n) codes converges to 
zero as n — > 0. Hence, we can assert that there must exist 
a sequence of (deterministic) (2™*, 2™" 1 , 2 nr2 , n) codes 
such that Pr(£) — > as n — > oo if (56) is satisfied. 
Clearly, with an appropriately small choice of e', this 
sequence of codes satisfies the rate conditions (21) with 
Ri = I(Y;U\X) and R 2 = I(X;U\Y), and also the 
probability of error condition (22). It only remains to 
verify (23) which we do below: 



1 n 

-y2E[d(Xi,Yi,Qi)] 



< ci max Pr(£) + E 



1 n 

— y d{Xi, Yi, Qi 



< d max Pt(S) + (1 + e')E [d(X, Y, Q)\ 



where the last inequality follows from the typical average 
lemma [8, pg. 26]. Thus, for a small enough choice of 
e', we can satisfy (23) as well with D = D*. ■ 

Appendix C 
Details Omitted from Section IV 

Proof of Theorem 4.3: 
It is easy to prove this theorem from the single-letter 
expressions for the regions in Theorem 3.1 (along with 
(12)) and Theorem 4.1 by making use of the mutual 
information equalities (49)-(51) at the top of page 21. 

■ 

Proof of Corollary 4.4: 

sup{R c : R A + R C = H(X), 

Rb + R C = H(Y), (i? A , R B , R C ) G ^gw} 

( = } sup{i? : (0, 0, 1(X; Y) - R) € K' GW } 

= sup{R : (0, 0, 1(X; Y) - R) e %(X; Y)} 

^C GK (X;Y), 

where (a) follows from the definition 7?/ GW = /(72-gw)- 
The < direction of (b) follows directly from Theo- 
rem 4.3. But < cannot hold since if (0, 0, 1(X; Y) - 



R) G %{X;Y), then there is a R' > R such that 
(0,Q,I(X;Y) - R') G TZ' GW . Finally, (c) follows from 
Corollary 3.3. 

To arrive at the alternative form, we verify the equiv- 
alence of the two forms. 

{R : R < I(X; Y), {R c = R} n £ G W C ^ Gw } 
= {R C :R A + R C = H(X), 

R B + R C = H(Y), (R A , R b , R c ) g ^ G w}- 

C: if R < I(X; Y), then {H(X) - R, H(Y) — R,R) e 
{R c = R}n £ GW - 

D: Let s = (H(X) - R C ,H(Y) - R C ,R C ) G K GW . 
Then (a) Rc < I(X; Y) since s G £gw> ar >d (b) if 
s' = (rA,re,i?c) G £gw> then since ta > H(X) — Rc 
and re > H(Y) — R G , we have s' > s (component-wise) 
which implies that s' G T^-qw f rom the definition of the 
GW system. ■ 

Proof of Corollary 4.5: 

Cwyner = hlf{i?C : (Ra, Rb, Rc) G 72.QW) 

Ra + Rb + Rc = H(X,Y)} 



(a) 



inf{i?! + R 2 + I(X; Y) : (R 1 ,R 2 , 0) G TZ' GVJ } 



= inf{i?! + R 2 + I(X; Y) : (R u R 2 , 0) G T(X; Y)}, 

where (a) follows from the definition 7l' GVJ = / (T^gw); 
(b) follows from Theorem 4.3: > direction follows 
directly from the theorem. But > cannot hold, since by 
the theorem, if (Ri,R 2 ,0) G %(X;Y) then there exists 
(i?i, R' 2 , 0) G TZ' GW such that R[ < Ri and R' 2 < R 2 . 

* 

Proof of Corollary 4.6: 

G(Y -»• X) 

= inf{i? c : (H(X\Y),H(Y) - R C ,R C ) G ft GW }, 



(a) . 



(b) 
(c) 



inf{i? : (R - I(X; Y), 0, 0) G W GVJ } 
mi{R : (R - I(X; Y), 0, 0) G T(X; Y)} 
I(X;Y)+T^(X-Y), 



where (a) follows from TZ'q^ = /(^-gw)- (b) is a 
consequence of Theorem 4.3: And (c) follows from the 
definition of T[ nt (X;Y). 

Similarly we get (38). The equality (39) is proved 
in [17] which along with (37)-(38) implies (40). 



Appendix D 
Details Omitted from Section V 

Here we prove Theorem 5.5. The following lemma 
will be useful in this. 
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Lemma D.l: Suppose n( X ' y ) ^> (U,V). Then 

I(U^ c] UZ h \UZ cc )< 25(e) 

/(IlSb;n53L|IlS? b )< 25(e) 

where (5(e) = 2H 2 (e) + elogmax{|W|, |V|}. 

Proof: We show J(n™£; n^jn^J < 25(e) 
(the other relation following similarly). Let Splice 
be as in Definition 5.4. Then /(S^; V\U) = 
and A(S^ e y,nX i -n B u t b ) < e. Also, we have 
A(^n^ ce n^ b )<e. Then 

7YTT v i cw ■ TT ou t lTT ou t \ 
1 ( 11 AUce> ii Bobl ii AliceJ 

r/TTViCW . TTOUt iTTOUt \ 

— 1 l 1J -Alice> il Bobl il AliceJ 



/(EJ£;y|^) 



(a) 



H(UZ h \UZL)-H(V\U) 

- H(UZ h \U^ c ) + (F|t/£ A ^ e ) 

ff(DSS,|D53L.)-ff(v|to' 



+ 



(b) 

< 25(e) 



H(V\^ e )-H(U^ h \U^ e ) 
I(V;U\T$Z>) 



Tview Tjout 

1± A 



where in (a) we used ^(n^jn^,^, 
^(n^ b |n^ e ) (because U°^ cc is a function of 
I^Ancc) and in (b) we bounded the two terms in the 
square brackets by invoking Lemma 2.6 twice, with 

({abc), (A'B'C)) being ((vvu), (n^n^n^,)) 

and ((VVX™), (n^ng^n^w)) respectively. ■ 

Proof of Theorem 5.5: Suppose there is a protocol 
n such that n( x " 2 > Y " 2 ) ^ (U ni ,V Ul ), for j£ > r - e'. 
We will denote the final views of the two parties in this 
protocol by (IT^f™, lEfgg). Also, we shall denote the 
outputs by (n^gjIIgJk). Then, firstly, by conditions 
(1) and (2) of Definition 5.2, 

M(U^ e ;U^)DM(X^;Y^). 



Secondly, by Lemma D.l, for random variables 
(nS^.n^L.n^.nSSb), the hypothesis in condi- 
tion (3') of Definition 5.6 holds, with <fi = 0(e) • n\ • 
log |W||V| where we set 0(e) = 2(2H 2 (e) + e). Hence 

-M(nssL;n§?b) 2 M(u^ c ;U^) 

+ C 0(e)-n 1 log|W||V|, 

where c is as in Definition 5.6. Finally, since 
A (^niT/n 1; n° u t cc n°^ b ) < e, by the continuity of M 
(condition (3") of Definition 5.6), we have 

M (U ni ; V ni ) D M (n A h* ce ; n^ b ) 

+ ?(e)-m log |W||V|, 



where 5(e) is as in condition (3") of Definition 5.6. 
Putting these together, after dividing throughout by n\ 
(using condition (4) in Definition 5.2 and convexity from 
condition (3")), and using ^ < ^377, we get 



1 



M(U;V) D jM(X-Y) + 5'(e)-log\U\\V\, 



r — e 



where S'(e) = ccj)(e) + 5(e). 

If the rate of statistically securely sampling (U, V) 
from (X, Y) is r, then for all e, e' > 0, the above relation 
should hold. Since 5' (e) I as e I and the regions 
M(U; V) and M(X; Y) are closed (condition (3")), we 
get 



M(U;V) D -M(X-Y) 



as required. 



